news

A computer tablet forms an AI cluster, which can run a 400B model at home, and has 2.5K stars on GitHub

2024-07-22

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

  • Cressey from Aofei Temple
    Quantum Bit | Public Account QbitAI

Without H100, three Apple computers can drive the 400B large model.

The hero behind this is an open source distributed AI reasoning framework on GitHub, which has received 2.5k stars.



Using this framework, you can build your own AI computing cluster using everyday devices such as iPhone and iPad in just a few minutes.



This framework is called exo. Different from other distributed reasoning frameworks, it uses a p2p connection method, and the device can automatically join the cluster by connecting it to the network.

The developer used the exo framework to connect two MacBook Pros and a Mac Studio, achieving a computing speed of 110TFLOPS.

At the same time, the developer said that he is ready for the upcoming Llama3-405B.



EXO officials also announced that they will provide support for Llama3-405B as soon as possible (day 0).



And it's not just computers. Exo can also allow devices such as iPhone and iPad to join the local computing network, and even Apple Watch can also be absorbed.



With the iteration of versions, the exo framework is no longer limited to Apple (initially only supported MLX), and some people have also pulled Android phones and 4090 graphics cards into the cluster.



Complete configuration in as little as 60 seconds

Unlike other distributed reasoning frameworks, exo does not use a master-worker architecture.Peer-to-peer (p2p)Connect the device to the

As long as the device is connected to the same local area network, it can automatically join the exo computing network to run the model.

When splitting the model across devices, exo supports different partitioning strategies, the default is ring memory weighted partitioning.

This runs inference in a ring, with each device running a number of model layers proportional to the device memory.



And the whole processAlmost no manual configuration requiredAfter installation and startup, the system will automatically connect to devices running in the local area network, and will also support Bluetooth connection in the future.

In one of the author's videos, the configuration was completed on two new MacBooks in just about 60 seconds.

You can see that after about 60 seconds, the program has started running in the background.



In addition, as can be seen from the picture above, EXO also supports Tiny ChatGraphical interface, and is also compatible with OpenAIAPI

However, such operations can only be performed on the tail node in the cluster.



Currently, exo supports Apple's MLX framework and open source machine learning frameworks.tinygrad, the adaptation of llama.cpp is also in progress.

The only drawback is that since the iOS implementation cannot keep up with Python, many problems have occurred in the program. The author has temporarily taken the mobile and iPad versions of exo offline. If you really want to try it, you can email the author to request it.



Netizen: Is it really that useful?

This method of using local devices to run large models has also sparked extensive discussion on HakerNews.

The advantages of localized operation are, on the one hand, better privacy protection, and on the other hand, the model can be accessed offline and also supports personalized customization.



Some people also pointed out that using existing equipment to build clusters for large-model calculations would have lower long-term usage costs than cloud services.



However, many people have expressed their doubts about the specific project of EXO.

First, some netizens pointed out that the computing power of existing old equipment is orders of magnitude lower than that of professional service providers. It’s okay to play with it out of curiosity, but if you want to achieve cutting-edge performance, the cost cannot be compared with large platforms.



Some people also said that the devices used by the author for demonstration are all high-end hardware. A Mac device with 32GB of memory may cost more than US$2,000. This price is not as good as buying two 3090s.

He even thinks that since Apple is involved, it can be said that it has little to do with "cheap".



This brings up another question - what devices are the exo framework compatible with? Is it only compatible with Apple?

The netizen's question was more direct, asking directly whether it supports Raspberry Pi.

The author replied that it is theoretically possible, but it has not been tested yet and will be tried next.



In addition to the computing power of the device itself, some people also added that the speed bottleneck of network transmission will also limit the performance of the cluster.

In this regard, the framework author personally explained:

What needs to be transmitted in exo is a small activation vector, not the entire model weights.
For the Llama-3-8B model, the activation vector is about 10KB; for Llama-3-70B, it is about 32KB.
Local network latency is typically low (<5ms) and does not significantly impact performance.



The author said that the framework currently supports tinygrad, so although the test is mainly carried out on Mac devices, (theoretically) all devices that can run tinygrad are supported.

The framework is still in the experimental stage, and the future goal is to make it as simple as Dropbox (a network disk).



BTW, EXO officials have also listed some shortcomings that they are currently planning to solve and have issued a public reward. Those who solve these problems will receive a bonus ranging from US$100 to US$500.



GitHub:
https://github.com/exo-explore/exo
Reference Links:
https://x.com/ac_crypto/status/1814912615946330473