news

Is a 3 trillion chip company also struggling to survive?

2024-07-30

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

Nvidia is planning to develop special AI chips for the Chinese market.

According to the latest report from Reuters, people familiar with the matter said that Intel is developing a new flagship AI chip for the Chinese market that meets the current US export control requirements and adds another member to Nvidia's domestic special chip lineup.

It is worth mentioning that NVIDIA released the "Blackwell" series in March this year, which is expected to be mass-produced later this year. According to NVIDIA, the B200 is 30 times faster than its predecessor in some tasks, making it one of the top AI chips currently available.

This new special flagship chip is also related to B200. Sources said that NVIDIA will cooperate with Inspur Group, one of its main distribution partners in China, to launch and distribute this chip tentatively named "B20". Judging from the name, it may have some of the features of B200.

Including the B20, NVIDIA has launched seven or eight special chips for the Chinese market in just over a year.

A800 and H800

On October 7, 2022, the U.S. government announced a series of export control measures, including cutting off the supply of certain semiconductor chips and chip manufacturing equipment to China.

In addition to affecting production equipment such as lithography machines, it also restricts China's access to high-computing power and artificial intelligence chips using advanced processes. This includes prohibiting American companies such as Nvidia and AMD from selling such chips to China, and restricting Chinese artificial intelligence chip companies from conducting wafer tapeouts in overseas Fab factories using American technology.

Both Nvidia and AMD were affected by this export restriction.

Nvidia said after the restrictions were introduced that the ban affects its A100 and H100 chips designed to accelerate machine learning tasks, and may hinder the completion of the development of the flagship H100 chip to be released in 2022. It pointed out that sales of the affected chips in China had reached $400 million that quarter, and if Chinese companies decided not to buy Nvidia's alternative products, that money would be lost.

So how specifically do U.S. export restrictions limit Nvidia chips?

According to the export restriction rules for advanced computing integrated circuits (ECCN 3A090 and 4A090) issued by the U.S. Department of Commerce on October 7, 2022, the list of controlled items must meet the following conditions:

a. Integrated circuits, excluding volatile memory, having an aggregate bidirectional data rate of all inputs and outputs capable or programmable to achieve 600 GB/s or more, and any of the following:

a.1. One or more digital processor units that execute machine instructions, where the bit length of each operation multiplied by the processing performance in TOPS, the sum of all processor units is 4800 or more;

a.2. One or more digital "raw computing units" (excluding units that assist in executing machine instructions related to computing the TOPS of 3A090.a.1) having a processing performance of 4800 or more times the bit length of each operation multiplied by the total TOPS of all computing units;

a.3. one or more analog, multi-value or multi-level "raw computing units" with a processing performance measured in TOPS times 8, with the total of all computing units reaching or exceeding 4,800;

a.4. Any combination of digital processor units and "raw computing units" the sum of which, as calculated under 3A090.a.1, 3A090.a.2, and 3A090.a.3, is 4800 or more.

3A090.a. The integrated circuits described include a graphics processing unit (GPU), a tensor processing unit (TPU), a neural processor, a memory processor, a vision processor, a text processor, a coprocessor/accelerator, an adaptive processor, a field programmable logic device (FPLD) and an application-specific integrated circuit (ASIC).

It is not difficult to see that the most important of these is the restriction on the speed of chip interconnection. According to this regulation, Nvidia's then hot-selling A100 fell precisely within the restriction range, with its chip-to-chip transmission rate reaching 600GB/s. To some extent, perhaps the US Department of Commerce specified this restriction based on A100.

In order to cope with export controls, NVIDIA quickly developed a replacement for A100 - A800. The US ban was officially issued on October 7, 2022, and one month later, NVIDIA came up with the A800 that adapted to the new regulations. It can be said that in the past, people adapted to local conditions, and now people adapt to policies.

According to the specifications, the NVIDIA A800 will use the same chip architecture as the Ampere A100 GPU. It will be available in three versions, two PCIe versions of 40 GB and 80 GB, and an 80 GB SXM version. These GPUs will provide up to 9.7 TFLOPs of FP64, 19.5 TFLOPs of FP64 Tensor Core, 19.5 TFLOPs of FP32, 156 TFLOPs (312 TFLOPs of sparsity) TF32, 312 TFLOPs (624 TFLOPs of sparsity) BFLOAT16, and 624 TOPS (1248 TOPs of sparsity) INT8 performance. The 40 GB version has HBM2 memory with up to 1.555 TB/s bandwidth, while the 80 GB version has HBM2e with up to 2 TB/s bandwidth.

Of course, in order to meet the restriction requirements, the bandwidth was inevitably cut from 600GB/s to 400GB/s. In a statement to Reuters, an Nvidia spokesperson said: "The A800 GPU went into production in the third quarter and is another product that Chinese customers can replace the A100 GPU. The A800 meets the U.S. government's clear test for reducing export controls and cannot be programmed to exceed that standard."

"The A800 looks to be a repackaged A100 GPU designed to circumvent recent Commerce Department trade restrictions," commented Wayne Lam, an analyst at CCS Insight, while noting that eight is a lucky number in China.

“China is an important market for Nvidia and reconfiguring products to avoid trade restrictions makes good business sense,” said Lam, who said the A800’s inter-chip communication capabilities are significantly reduced for data centers using thousands of chips.

Nvidia followed suit with the H100 and released the H800. On the A100, Nvidia reduced the GPU's 600 GB/s interconnect to 400 GB/s, and it did the same with the H100. It is reported that the H800's chip interconnect rate is reduced to about half of the H100, from 800 GB/s to 400 GB/s. Compared with the A800, the H800's performance is more affected, after all, the former is only reduced by 33%, while the latter is reduced by a full 50%.

At the time, an Nvidia spokesperson declined to say how the H800 for the Chinese market differed from the H100, saying only that "our 800 series products fully comply with export control regulations."

While overseas manufacturers are frantically purchasing A100 and H100, domestic manufacturers can only choose the lower-configured H800 and A800. Nvidia's special edition chips have restricted the development of domestic large AI models to a certain extent.

H20 and RTX 4090D

For domestic companies, A800 and H800 have both advantages and disadvantages. The disadvantage is that after cutting the interconnection bandwidth, the performance of these two chips is a little worse, and the training speed has also slowed down a lot. The good thing is that they can all be ordered through the Zhengcheng channel, but compared with foreign companies, the cost spent on chips will be higher.

But A800 and H800 did not survive for more than a year. On October 17, 2023, the U.S. Department of Commerce issued new control rules to supplement and update the export controls on advanced computing integrated circuits, semiconductor manufacturing equipment, and items supporting supercomputing applications and end uses issued on October 7, 2022.

Chief among these is the change in control parameters. The interim final rule removes “interconnect bandwidth” as a parameter for identifying restricted chips under ECCN 3A090, and instead states that if a chip exceeds one of the two parameters (3A090.a and 3A090.b) specified in ECCN 3A090, exports will be restricted.

According to the Commerce Department document, the revised 3A090.a control parameter will control integrated circuits with one or more digital processing units that have a "total processing performance" of 4800 or more, or a "total processing performance" of 1600 or more, and a "performance density" of 5.92 or more. The new ECCN 3A090.b will control integrated circuits with one or more digital processing units that have one of the following conditions: a "total processing performance" of 2400 or more but less than 4800, a "performance density" of 1.6 or more but less than 5.92, or a "total processing performance" of 1600 or more, and a "performance density" of 3.2 or more but less than 5.92.

In addition, the rule establishes a licensing exception that creates a new "License Exception Notice Advanced Computing" for consumer-grade ICs with AI capabilities below the restriction threshold. The exception applies to two types of products: one is a chip designed or sold for use in data centers, and the second is a chip not designed or sold for use in data centers with a "total processing performance" of 4800 or more.

Compared with the rules of October 7, 2022, the new rules have once again expanded the scope of control. Under the dual rules of total processing performance and performance density, all products, whether castrated or not, are included in the scope of export control. A800 and H800 are affected, and other NVIDIA products are also affected. The L40 and L40S for the inference market, and the RTX 4090 in the consumer field are also included in the ban on sales.

This was a heavy blow to Nvidia, meaning that the mainstream products Nvidia was selling at the time could not be sold in China due to export controls. It should be noted that in previous years, data center business revenue from the Chinese market accounted for about 20% to 25% of Nvidia's overall revenue. In the fourth quarter of fiscal 2024, due to an export control, revenue in the Chinese market plummeted to single digits.

Helpless Nvidia could only swing the machete again.

On November 16, 2023, one month after the new rules were released, NVIDIA launched GPU chips specifically for the Chinese market - H20, L20, and L2. H20 is based on NVIDIA's Hopper architecture, while L20 and L2 are based on the Ada architecture.

The L20 and L2 chips are based on the L40 and L4 chips respectively. They are old architectures and are not often used in inference and training, so they have not received much attention. The H20 is more interesting. Although it has a full-blooded 900GB/s NVLink due to the new regulations that no longer limit the interconnection speed, its performance has been greatly discounted. According to analyst Dylan Petal, even if the actual utilization rate of H20 can reach 90%, its performance in an actual multi-card interconnection environment is still only close to 50% of H100.

As for the RTX 4090 for the consumer market, NVIDIA also launched a replacement, the RTX 4090D, in December last year. This special chip, which complies with US export controls, has been castrated in terms of CUDA cores and power consumption. The CUDA cores have been reduced by 12.8%, and the power consumption has been reduced from 450W to 425W, a decrease of 5.9%. All other core specifications remain unchanged.

Due to the slight increase in main frequency, in some benchmark tests, the performance of 4090D is only about 5% lower than that of 4090. Compared with AI chips, this gap seems to be within an acceptable range.

These four special-supply chips at the end of 2023 have alleviated Nvidia China's embarrassment to a certain extent, preventing it from falling into a situation where it has no goods to sell. However, after two rounds of restrictions, large enterprises and small and medium-sized companies have begun to seek other ways out, either buying domestic chips, building servers overseas, or buying H100/200 and A100 through unofficial channels. Helplessness has spread among domestic manufacturers.

Lao Huang's knife skills

DIY players who are familiar with gaming GPUs will not feel unfamiliar with Huang’s knife skills.

To take a more recent example, one year after the release of the RTX 20 series, in order to better differentiate the product line and respond to the launch of new products from AMD's RX5000 series next door, NVIDIA launched the RTX 20 Super series.

Although they are all based on TU106 and TU104 cores, NVIDIA has used these two cores to produce five graphics cards, namely RTX 2060, RTX 2060 Super, RTX 2070, RTX 2070 Super and RTX 2080. Among them, the smallest gap is between RTX 2060 Super and RTX 2070, which both use TU106 cores. The theoretical performance gap between the two is only about 5%, and the actual running scores and game tests are also very close. It can be said that NVIDIA has taken castration to the extreme.

Now, the China-specific version of the chip that Nvidia has modified and released is nothing more than a repeat of the past and a return to its old business.

In addition to the B20 we mentioned at the beginning, Nvidia also plans to show its skills in the consumer market. According to reports, the scaled-down version of the RTX 5090, the RTX 5090D, is expected to be launched in January 2025. It is expected to be based on Nvidia's Blackwell architecture and adopt TSMC's 4NP process. It may be reduced in core specifications to circumvent US export restrictions.

Including these two rumored chips, Nvidia already has a huge lineup of special products in China: A800, H800, H20, L20, L2, RTX 4090D, B20, RTX 5090D,

Some people are optimistic about the prospects of these special edition chips. Research firm SemiAnalysis estimates that Nvidia is expected to sell more than 1 million H20 chips in China this year, with a value of more than $12 billion.

But Nvidia has a lot more to worry about. According to a report by Jeffries analysts, when the United States conducts its annual review of semiconductor export controls in October, it is "very likely" that Nvidia's H20 chips will be banned from sale to China. The analyst said the ban could be implemented through "specific product bans, lowering the computing power ceiling and/or limiting memory capacity."

In addition, artificially manufactured compliant cards like H20 are essentially based on the castration of existing cores. The same is true for B20. It could have been used as the core of H200 and B200, but now it can only be sold as a cheaper special edition, and its sales life is likely to be only more than a year. It is a loss-making business no matter how you look at it.

But Nvidia has no way out. It can only do its best to balance the scales of rules and market. But how many companies in China are willing to pay for the special edition chips that are cut again and again?