news

Huang has another surprise! He launches the first "special edition" GB20 server, which may be launched next year

2024-07-26

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina


New Intelligence Report

Editor: Editorial Department

【New Wisdom Introduction】Foreign media revealed that Nvidia will launch an AI chip customized for the Chinese market next year, and has even developed a server to go with it. This is the first time in Nvidia's history.

The struggle between Nvidia and the US Department of Commerce has escalated again!

Under US export controls, Huang has been constantly coming up with workarounds, and a cat-and-mouse game is unfolding.


The implementation of the new export control regulations has greatly delayed Huang's ability to make money in the Chinese market.

As of January this year, exactly one year after the new export regulations were implemented, the proportion of the Chinese market in Nvidia's revenue has dropped to 17%. Two years ago, this figure was 25%+.

In the face of regulation, Nvidia has continuously launched "special edition" chips customized for China, but these chips with weakened performance often have poor sales.

The sales of H20, which had started off badly, had just started to rise, but Nvidia was shocked to hear the bad news.

Jefferies analysts revealed in a research report last week that the U.S. Department of Commerce will conduct its annual review of semiconductor export restrictions in October and may ban the export of H20 chips.

Supervision is constantly escalating, and this time, Lao Huang has given it his all.

Not only did it launch the new flagship AI chip B20, but because it was worried that it might not be good enough, it also launched the matching server GB20.


As you can guess from the name, B20 is a variant of the Blackwell B200 released by Nvidia in March this year.

As the fastest GPU to date, the B200 can process certain tasks 30 times faster than its predecessor. (eg. Let the chatbot generate answers)

Unfortunately, this has nothing to do with the "special edition" B20 chip...

The B20, which was born under the restrictions of US export control policies, is destined to be an entry-level product, in stark contrast to the B200 with industry-leading AI performance.

But according to the whistleblower, although the B20 will be slower than the B200 when processing AI calculations, installing a large number of chips together in the GB20 server can partially make up for this shortcoming.

This will ensure that Nvidia maintains a certain level of competitiveness when competing with Chinese products, while also complying with the chip computing power cap set by US export controls.

A cat-and-mouse game under regulation

Since the end of 2022, Nvidia has reconfigured its chips for the Chinese market several times, targeting the needs of Chinese customers who need chips to develop LLMs, trying to remain attractive to Chinese customers while complying with US regulations.

In October 2022, the US government banned the sale of Nvidia's A100 and H100 (its most advanced AI chips at the time).

A few months later, NVIDIA launched two alternative products, A800 and H800, for the Chinese market.

Less than a year later, the U.S. Department of Commerce again updated its export control measures and imposed restrictions on the two chips.

Nvidia responded quickly.


Among them, the H20 chip has limited performance but higher connection speed, and brings better performance thanks to its high-bandwidth memory and mature software support.

Although initial sales were poor, more and more Chinese customers are choosing to buy this chip.

According to four people directly involved in Nvidia's chip sales in China, Chinese companies have ordered more than 500,000 H20 chips with a total value of nearly $5.8 billion, which will be delivered in 2024.

Research firm SemiAnalysis also made an optimistic estimate that Nvidia will sell more than 1 million H20 chips in China this year, with a value of more than US$12 billion.

But as mentioned above, this business faces new threats from the Ministry of Commerce. When regulations are adjusted at the end of the year, H20 may be banned from sale.

The ban could take a variety of forms, including a ban on specific products, reducing the computing power of chips or limiting their memory capacity.

After all, given the macro background, the outside world generally expects the United States to continue to exert pressure on semiconductor-related export controls.

Sources said the United States wants the Netherlands and Japan to further restrict the supply of chipmaking equipment to China.


With increasingly stringent regulations, more and more "special editions" may appear.


Rumor has it that Nvidia is also developing a new flagship gaming graphics card, the RTX 5090D.


The graphics card will be designed specifically for the Chinese market and is the successor to the RTX 4090D, the first graphics card for consumers that meets export standards.

Chip performance is not enough, the server will make up for it

The specific specifications of B20 have not yet been determined, but what is certain is that it will definitely not break the "ceiling" of the US GPU export policy.

Just like its predecessors - the previously launched H20, L20 and L2, the performance of these three chips can be described as an "epic" reduction. Not only is the performance low, but they are also only equipped with a low-end version of the NVLink connection.

TPP & PD two indicators lock the throat

The United States has strict performance regulations on GPU exports to China, using an indicator called "Total Processing Power" (TPP).

This metric takes into account the TFLOPS and precision of the GPU's computing power. Specifically, TPP is calculated by multiplying TFLOPS (excluding sparsity) by precision (in bits).

The current limit is set at 4,800 TPP, what does this mean?

Taking NVIDIA's previous products as a reference: Hopper H100 and H200 have far exceeded this standard. The TPP of both GPUs has reached 16,000, which is more than three times the prescribed upper limit.


Even the RTX 4090, with its 660.6 TFLOPS of FP8 compute power, exceeds the limit.

The most powerful Nvidia desktop GPU that stays within the 4,800 TPP limit is the RTX 4090D, which was built specifically to comply with the export restrictions.


Blackwell has set a new benchmark in computing performance, with its dual-chip solution potentially outputting approximately 4,500 TFLOPS of FP8 computing power, which would be 7.5 times the export limit.

In other words, the performance of B20 will be less than 1/7 of that of Blackwell B200!

Not only that, B20 also faces additional limitations - "Performance Density" (PD) limitations.

This is a restriction specifically imposed by the United States on data center GPUs, and consumer-grade GPUs are not affected by it.

The PD index can be obtained by dividing the TPP score by the chip size. The PD of GPUs exported to China must not be higher than 6.0.

According to this indicator, the RTX 40 series GPUs (Ada Lovelace architecture) can no longer be used in data centers.

Blackwell is clearly superior to Ada Lovelace in terms of density and performance.

That is to say, Nvidia needs to strictly limit the performance of the B20 or use a larger proportion of chips to comply with relevant regulations.

According to Tom's Hardware, the B20 will be the successor to Nvidia's A30 and H20 entry-level AI GPUs.

Taking H20 as an example, its FP16 computing power is only 296 TFLOPS, TPP is 2368, and PD is only 2.90, while H100/H200 is 1979 TFLOPS.


Meanwhile, A30 has a TPP score of 2640 and a PD score of 3.20, which is slightly higher than H20.

This shows that the AI ​​GPU built by NVIDIA for the Chinese market has room for improvement in performance, but the space is very limited.

Best case scenario, Nvidia might seek to create a GPU with a TPP between 4,000 and 4,500 and a die size of 800 square millimeters.

GB20: All-out efforts to save B20's performance

To improve the efficiency of the upcoming B20, Nvidia is adopting strategies it used for the H20, such as upgrading memory capacity because memory chips are not subject to current export controls.

Nvidia is working to increase the speed at which data can be transferred between memory and the B20 processor, allowing large energy data sets to be processed more quickly, according to two people involved in developing the server.

Not only that, Nvidia will also adopt its NVLINK technology (which enables fast communication between different chips) and cooling solutions in the GB20 rack design.

This is expected to increase the utilization of the B20 chip and the effectiveness of the GB20 computing cluster in powering artificial intelligence, the two added.

The GB20 system will enable customers to perform AI training and inference by running multiple chips in parallel more efficiently.

Before GB20 was revealed, Tom's Hardware boldly predicted that B20 would be a difficult chip to sell.

But now with the support of GB20, there seems to be new hope.

References:

https://www.tomshardware.com/pc-components/gpus/nvidia-preparing-a-china-focused-variant-of-its-b200-blackwell-ai-gpu-to-comply-with-us-export-regulations

https://www.theinformation.com/articles/nvidias-new-china-chip-has-special-server-design-to-skirt-u-s-rules