news

Musk launches the "world's most powerful AI cluster": integrating 100,000 NVIDIA H100 GPUs!

2024-07-23

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

On July 23, Tesla, "X", and xAI CEO Elon Musk recently announced on the "X" platform that he has launched the "world's most powerful AI cluster" to create the "world's most powerful AI" by December this year - the system will integrate 100,000 Nvidia H100 GPUs on a single structure.

Musk said that starting at around 4:20 a.m. local time, thanks to the efforts of the xAI team, the X team, Nvidia and supporting companies, its Memphis supercomputing factory "Supercluster" began to operate normally - it has 100,000 liquid-cooled H100s on an RDMA structure, making it the world's most powerful artificial intelligence training cluster!

In May, Musk said he would open a supercomputing factory by the fall of 2025, and at the time, Musk rushed to start work on the Supercluster, which required the purchase of Nvidia's "Hopper" H100 GPUs. This seemed to indicate that the tech tycoon did not have the patience to wait for the H200 chip to be launched at the time, let alone the upcoming Blackwell-based B100 and B200 GPUs. Although the newer Nvidia Blackwell data center GPUs are expected to ship before the end of 2024.

Well, according to the latest news, the supercomputing factory, which was originally scheduled to open in the fall of 2025, is now expected to be realized nearly a year ahead of schedule. But it is too early to draw conclusions. But earlier this year, sources who spoke to Reuters and The Information seemed more likely to have misspoken about the project's timing. In addition, with the xAI Supercluster up and running, the question of why xAI didn't wait for more powerful or next-generation Nvidia GPUs has also been answered.

Charles Liang, CEO of Supermicro, which provides much of the hardware for xAI, also commented on Musk’s post, saying, “It’s been great working with Elon’s Memphis team! To achieve our goals, our execution has to be as perfect as possible, as fast as possible, as efficient as possible, and as environmentally friendly as possible — a lot of hard work.”

In a subsequent tweet, Musk explained that the new Supercluster will “train the world’s most powerful AI in every way.” From the previous statement of intent, xAI’s 100,000 H100 GPU installation will now train on the Grok 3 model. Musk said the improvedLLMThe training phase should be completed "before December this year."

In terms of scale, the new xAI Supercluster will surpass the current most powerful supercomputers in terms of GPU computing power, such as Frontier (37,888 AMD GPUs), Aurora (60,000 Intel GPUs) and Microsoft Eagle (14,400 Nvidia H100 GPUs).

Editor: Core Intelligence - Rurouni Ken