news

Nvidia's new AI chip will be released at least 3 months later, and giants such as Microsoft, Google, and Meta will be affected

2024-08-03

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

Key points:

Tencent Technology News, August 3, according to foreign media reports, Nvidia's latest artificial intelligence chip project encountered a design flaw. According to two sources involved in the production of chips and server hardware, this accident will cause the release time to be delayed by at least three months, or even longer.

The change is expected to affect several major customers including Meta, Google and Microsoft, which have pre-ordered tens of billions of dollars worth of the chips.

Nvidia informed Microsoft, one of its largest customers, and another major cloud service provider this week that delivery of its most advanced artificial intelligence chips in its Blackwell series will be delayed, according to a Microsoft employee and another person familiar with the matter.

Nvidia officially launched the Blackwell series in March this year, and CEO Huang Renxun optimistically announced in May that it plans to achieve large-scale shipments of this series of chips within the year. However, this process suffered a setback after encountering design difficulties. According to people directly involved in the production of Blackwell chips, Nvidia is working with its chip manufacturer TSMC to carry out a new round of test production in full swing to overcome the current technical obstacles.

In view of this, it is expected that large-scale shipments of Blackwell chips will be delayed until the first quarter of next year. It is worth noting that once cloud service providers receive the chips, it will take an additional three months to deploy and activate large chip clusters.

Design and production challenges have heightened concerns about Nvidia’s position, especially as the U.S. Justice Department investigates complaints of anticompetitive behavior. Still, Nvidia has maintained its lead thanks to its chips’ significant performance advantages.

Shareholders have high hopes for the Blackwell series, and analysts at Keybanc Capital Markets predict that this series of chips is expected to drive Nvidia's data center revenue from $47.5 billion in 2024 to more than $200 billion in 2025. Huang Renxun said in a May earnings call: "We will witness Blackwell bringing in rich revenue this year!"

Nvidia’s AI server chips, called graphics processing units (GPUs), have long been a core driver of conversational and video AI efforts by developers like OpenAI, and they have helped cloud service giants like Microsoft Corp. to grow sales by renting out their chips to other developers.

If the upcoming artificial intelligence chips B100, B200 and GB200 are delayed by at least three months, it may disrupt the deployment plans of some customers and prevent them from running large chip clusters in data centers as scheduled in the first quarter of 2025.

Big-name customers such as Microsoft, OpenAI and Meta are eagerly awaiting Nvidia's new chips, aiming to use these technologies to upgrade the development of the next generation of large language models, which are the core software behind ChatGPT, Meta AI assistant and a series of innovative automation functions.

The companies emphasize the need for more computing power to achieve leaps in software performance to more accurately answer complex queries, automate multi-step tasks or generate highly realistic video content. They are pinning their hopes on Nvidia's next generation of artificial intelligence chips, especially when they are combined into supercomputer clusters.

An Nvidia spokesperson was cautious about the delays, saying only that "production progress will be accelerated as planned" later this year.

Officials from Microsoft, Google, Amazon and Meta declined to comment. A TSMC spokesperson did not respond to a request for comment.

It is worth noting that Nvidia’s major customers have high hopes for the GB200 chip and have already made grand plans. Recently, Google, Meta and Microsoft have revealed their huge investment growth in data centers and artificial intelligence chips. This rare move not only temporarily pushed up Nvidia’s stock price, but also triggered extensive market discussions on the investment payback cycle and profit prospects of these companies.

Blackwell Big Order

According to two sources in the chip production field, Google has ordered more than 400,000 GB200 chips, together with related server hardware, and the total order value may far exceed $10 billion, but the specific delivery time is still unclear. In contrast, Google's investment in chips, equipment and assets this year has climbed to about $50 billion, an increase of more than 50% year-on-year.

Meanwhile, Meta has also made a big move, with a total order value estimated to be no less than $10 billion. Microsoft, although it has not disclosed the total order size, has made incremental adjustments to orders by 20% in recent weeks. According to people directly familiar with the matter, Microsoft is preparing to equip OpenAI with 55,000 to 65,000 GB200 chips by the first quarter of 2025.

Microsoft had originally planned to deliver Blackwell-based servers to OpenAI by January next year, but now may need to adjust the schedule to March or early spring, a person familiar with the matter said.

Two insiders involved in the manufacturing process of the Blackwell chip revealed that the design difficulties of the chip emerged in recent weeks, and TSMC engineers discovered critical defects in the preparation for mass production. Specifically, the problem focused on the processor chip connected to the dual Blackwell GPU. The failure of this silicon component directly affected the overall output, that is, the number of chips TSMC could supply to Nvidia was limited. Such problems often prompt companies to suspend production activities.

In view of this, Nvidia is urgently adjusting its design plan and needs to re-conduct production tests at TSMC to ensure that the problem is resolved before starting large-scale production processes.

According to people familiar with the matter, Nvidia has told at least one cloud service provider that in order to cope with the current difficulties and speed up product delivery, the company is considering launching a version equipped with only a single Blackwell chip as an alternative.

Unusual delays

TSMC originally planned to start mass production of Blackwell chips in the third quarter and expected to start mass shipments to Nvidia customers in the fourth quarter. However, it is now expected that mass production of Blackwell chips will be postponed to the fourth quarter, and if there are no new problems in the future, servers will be shipped in large quantities in the following quarters.

Chip production delays are not uncommon. According to people familiar with the matter, Nvidia also encountered delays in early versions of its flagship GPU in 2020, but its market influence was still low at the time and customers' expectations for orders were not high, so the immediate impact on data center and chip investment revenue was limited.

However, it is unusual to find a major design flaw just before mass production. Usually, chip design teams work closely with manufacturers such as TSMC to undergo multiple rounds of production testing and simulation to ensure that the product is mature and reliable before accepting large-scale orders.

According to TSMC internal employees, as the world's leading chip manufacturer, TSMC has rarely suspended its production line to redesign this product that is about to be mass-produced. In view of the large-scale production plan of the GB200 chip, TSMC has pre-allocated production resources, but these resources will be idle until the problem is resolved.

In addition, the design flaw also affected the production and delivery progress of Nvidia's NVLink server racks, as the company had to wait for new chip samples to complete the rack design. (Compiled by Jinlu)