news

Encirclement and suppression of Nvidia | Deep Krypton

2024-08-01

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

Text|Qiu Xiaofen

Interview|Qiu Xiaofen and Yang Xiao

Editor|Su Jianxun Yang Xuan

Resistance

A secret WeChat group has been passed on by word of mouth among people at AI chip companies in Zhangjiang, Shanghai. The name of the group is "Domestic Chip Company Warmth Group". Only employees of domestic chip companies can join the group. Here, even competitors will exchange information and business resources.

The "domestic chip group that keeps each other warm" has designated a ceasefire zone where peers can temporarily suspend fighting and help each other, simply because they all have a common enemy: Nvidia.

Because of the existence of NVIDIA, domestic chip companies have experienced some humiliation in their sales.

Li Ming (pseudonym) is a salesperson at a domestic GPU company. When this round of AI craze started, he went to meet clients with full confidence. However, before he could even say hello, the client started asking a barrage of questions:

"What's the difference between your product and Nvidia's A100 chip? Nvidia has NVLink, what do you have?" (Author's note: NVLink connects multiple GPU chips to avoid moving GPU data to the CPU for calculation, thereby improving computing efficiency)

Seeing that they could not impress customers with their products and technology, Li Ming's team began to try to use their connections and find "more powerful people" to lobby, but the customers still shook their heads and said, "We still want to use NVIDIA."

NVIDIA A100, with 54 billion transistors crammed into an area of ​​826 square millimeters, is the key to unlocking the Pandora's box of large AI models.

Large model training is like “refining elixir” in massive data, the purpose is to find the law of data change;Using NVIDIA chips to train large models is like asking hundreds of millions of people with an IQ of 200 to do arithmetic, while the effect of other chips is only equivalent to finding a few thousand people with an IQ of 100 to do the calculation.


NVIDIA A100 Image from NVIDIA official website

The top technology companies are all scrambling to buy Nvidia. Whoever owns the most Nvidia high-end GPUs will have the opportunity to train smarter large models.

Public information shows that OpenAI currently has the largest number of NVIDIA high-end GPUs in the world, at least 50,000 pieces; Google and Meta are also owners of clusters of tens of thousands of cards (about 26,000); and the only rare Internet giant in China that has a cluster of tens of thousands of NVIDIA high-end GPUs is ByteDance (13,000).

NVIDIA monopolizes the best resources in the global industrial chain - it has TSMC's most abundant advanced chip process production capacity, has the world's largest group of engineer users, and controls the computing lifeline of a number of AI companies.

Absolute monopoly often breeds dissatisfaction, anger and escape.

"Today, almost all companies that make large models are losing money! Only one company is making money: Nvidia," an industry insider said indignantly."Nvidia's profit margins make all customers uncomfortable and hurt the AI ​​industry!"

The financial report shows that Nvidia's gross profit margin reached 71%, among which the popular products A100 and H100 series had a gross profit margin of up to 90%. As a hardware company, Nvidia actually enjoys a higher gross profit margin than Internet software companies.

High prices and huge profits caused Nvidia’s major customers to flee. On July 30, Apple announced that its AI model was trained with 8,000 Google TPUs, while Nvidia’s content was 0. As soon as the news came out, on July 31, Nvidia’s stock price fell by more than 7%, hitting the largest drop in nearly three months, and its market value evaporated by $193 billion.It almost lost the value of Pinduoduo.


Nvidia's stock price decline in the past year

For all domestic GPU companies that want to tear off a piece of meat from Nvidia, 2022 is a year of turning point. After several rounds of bans were issued in the United States, Nvidia was forced to continue to launch castrated versions of chips in China due to its livelihood, but they were quickly banned:

  • In September 2022, A100/H00 was banned from export to China, and NVIDIA launched the castrated version A800/H800; in October 2023, A800/H800/L40/L40S/RTX4090 were banned from export to China; in June 2024, NVIDIA founder Huang Renxun said that the castrated version of L20 and H20 chips would be pushed to China.

However, the castrated version has triggered even more intense criticism from the industry. Nvidia's upcoming H20 is half the price of Nvidia's H100, but its performance is only one-third of the former. An AI industry insider angrily denounced: "Isn't this just stealing money? It's pure IQ tax!"

When Nvidia's customers began to feel dissatisfied and angry, domestic chip companies that wanted to replace Nvidia were "nourished" by this sentiment.

In the past, they could only follow behind Nvidia and share a little bit of the cake crumbs with difficulty. According to a data from semiconductor analysis company TechInsights, in 2023, Nvidia's market share in data center GPU shipments will reach 98%. Domestic chips and a number of chip giants combined,It only accounts for a meagre 2%.

Now that the ban has come, the perfect Nvidia has been torn open in the Chinese market. Who can replace Nvidia? Domestic AI chip manufacturers have seen a chance.

"This year, Nvidia has released 90% of its market share in China.Whether you can grab it depends on your ability.", said the founder of a domestic GPU company.

36Kr published "Deep Krypton | CATL: Cracks in the Trillion-Dollar Battery Empire" in 2021. In the power battery industry, CATL is far ahead and is being coveted by its competitors.

Today, Nvidia, which dominates the field of AI chips, is also seen as a thorn in the eyes of many of its peers. But the difference between Nvidia is that its barriers are higher and the gap with its competitors is greater.

We try to find "Nvidia's cracks" through the resistance of Nvidia's opponents. Looking at the GPU industry, although domestic GPU and AI chip manufacturers are weak, they understand the Chinese market better and their strategies are more localized; while old chip giants such as Intel and AMD have more sufficient food and ammunition to confront Nvidia head-on.

Nvidia will not be defeated in the short term, but it will not be unscathed either. This is destined to be a bloody war.

Breakout

If you want to break through, you must find your opponent's weakness.One of Nvidia's weaknesses is: hubris.

The chip industry is essentially a To B software industry. Customers need chip manufacturers to provide "companionship" services, such as debugging the hardware and ensuring hardware and software compatibility. Only when the companionship is in place can customers be loyal and chip products will not be easily replaced.

However, several domestic chip professionals told 36Kr that in the Chinese market, except for large purchasers with billions of dollars in scale like BAT and ByteDance, most other companies find it almost impossible to get after-sales service from NVIDIA even if their transaction volume reaches tens of millions.

In other words, when Chinese engineers using NVIDIA chips have doubts, they can only search for documents on NVIDIA's official website or study in the community on their own.

When working with NVIDIA, Chinese customers' demands are often not met.A chip industry insider told 36Kr that NVIDIA generally promotes the highest-spec, most expensive complete solution in China, and when customers make customized requirements for specific scenarios, they are generally rejected. After buying the card, customers have to "figure it out on their own, or find a powerful algorithm company to get it done."

Nvidia's practice has accumulated a lot of complaints from small and medium-sized customers. "As a large company now, Nvidia no longer pays as much attention to small customers as it did in the past. Their products have no challengers, so they don't need to work hard to please customers," said the aforementioned person.

But in the past, the rise of NVIDIA's ecosystem actually confirmed the importance of services to the chip industry: in 2006, when the CUDA ecosystem was just starting out, NVIDIA's products were not as good as today's domestic chips. But the NVIDIA team first promoted it from the scientific research teams of universities, and then infiltrated the start-ups in each sub-industry to adapt the software and hardware, which led to the current huge market.


NVIDIA H100 Image source: NVIDIA official website

Chinese chip manufacturers have also realized this and are trying to start with customer service.

Since 2023, a domestic AI chip company that does not want to be named has tried to have backstage R&D personnel go to the front line to provide personal services. Not only do they have to be on-site for joint training, but after the sales order is placed, a small group with R&D personnel is specially created for the customer. Customers with a unit price ranging from several million to hundreds of thousands can enjoy 24/7 consultation.

It is not enough to just provide localized and considerate services. In addition to NVIDIA's large-scale withdrawal from China,The chip industry is no longer simply a battle between chip products themselves., and it also tests each company's grasp of the time window. Chinese chip manufacturers pounced on them like a pack of wolves, and a vigorous order-making began.

Huawei is the most aggressive. Huawei previously launched the "Spark All-in-One" device in collaboration with iFlytek, which is equipped with the "Ascend 910B".

This chip was once said to have a single-card capability that is comparable to Nvidia's A100. What is not well known is the difficult side behind the glamorous case. 36Kr learned that Huawei spared no human cost and deployed hundreds of engineers to help iFlytek adjust the parameters.


Huawei and iFlytek released the Spark all-in-one machine. Image source: iFlytek

Although this is called "hand-crafted" by the industry, once the benchmark case came out, many large model companies and Internet companies extended olive branches to Huawei for testing.

A domestic chip sales company was surprised to find that since July last year, Huawei's high-level management has been on site in all open intelligent computing center projects. "Huawei can now send hundreds of people to serve a project, and even some key projects are not afraid of losses in order to gain revenue from other projects."

The aforementioned unnamed chip company also has 200 hard-core salespeople - a rare configuration in the domestic chip industry. Their sales team started with the three hottest areas for big model implementation: finance, law, and industry. They appear in almost every exhibition related to computing power. "In the chip industry, resources are the first priority. If you run slowly, you will die."

A secret price war for domestic chips has also begun.

A chip industry insider told 36Kr that their goal is to win more orders from benchmark intelligent computing centers, regardless of the unit price. 36Kr observed that some domestic companies’ inference cards, in order to reduce costs, did not hesitate to remove the expensive HBM (high bandwidth memory), and evenShipping at 50% below cost price.

"No matter what, everyone still hopes to make breakthroughs from various entry points and take a small piece of the pie from Nvidia so that Nvidia will no longer be the only dominant player."

But the reality is cruel. When it comes to products, domestic AI chips are bound to have various problems.

A chip expert gave 36Kr an example: To process the same data set, it may only take ten days to run it using NVIDIA's A100 cluster, but it may take several months using some domestic chip products.The domestic chip hardware has been accumulated for too short a time and lacks advanced processes. The gap in hardware leads to low efficiency.

The software shortcomings are also obvious. Another industry insider found through testing that when using domestic chips to run large models, if you want to make more cool applications on them, once the basic large model is changed, the domestic chips are prone to crashes. "In many cases, people basically use domestic chips with a pinch of their noses."

Today, each company has truly seen the "encirclement and suppression" strategy in front of them, and has gradually differentiated into more realistic paths:

Although there are still a few who continue to move towards Wanka clusters, focusing on training scenarios and competing with NVIDIA, represented by Moore's Threads and Huawei; but the choice of the majority is to pay more attention to the implementation of large/small models in various industries, starting from inference scenarios that do not have high requirements for hardware and software, represented by Suiyuan, Tianshu Zhixin, etc.

(36Kr Note: The big model has two parts: training and reasoning. Training is the process of finding patterns in billions of databases."make"Large model; reasoning is"use"The process of large models is less difficult.Lower requirements for software and hardware, and closer connection with industry)


Moore Threads released the Kwae Wanka cluster at the 2024 Artificial Intelligence Conference. Photographed by 36Kr

"We are not blindly chasing after Nvidia. We cannot afford it, and we dare not blindly make chips with ultra-large computing power," a chip industry insider said bluntly.

A realistic consideration for domestic chip manufacturers is that Nvidia'sTheir main focus is not here, so domestic manufacturers have avoided direct competition with Nvidia.

Previously, most companies used Nvidia's consumer-grade gaming graphics card 4090 to run inference due to cost considerations. These cards had a lot of problems: excessive power consumption, insufficient memory, and being disabled. Nvidia officials also did not allow these consumer-grade graphics cards to be used for large model inference.

Domestic chip companies cut into Nvidia's blank. Tianshu Zhixin and Suiyuan are both pushing inference cards that match the 4090 this year, playing the selling points of large memory, low power consumption and stable supply.

Domestic chip manufacturers have also made it clear that it is important to identify the right market segments. For example, they focus on low-power small chips for some power-sensitive scenarios, or they focus on segmented scenarios such as video optimization to do small but beautiful business.

The battlefield of giants is tense

When domestic GPU companies wrote "surpassing NVIDIA" in their PPT, it was more like a beautiful vision. These companies were established not long ago and have caught up with the trend of domestic substitution. Even if they only surpassed NVIDIA a little, it was regarded as a feat. NVIDIA is both an opponent and a benchmark for them.

But when it comes to Intel and AMD, which are of the same generation as Nvidia, the atmosphere is even more tense.

We consider Nvidia our mortal enemy internally.", a researcher from AMD's MI series product line told 36Kr.

At the Taipei Computer Show (Computex 2024) in June this year, AMD founder Lisa Su (who is also the cousin of Nvidia founder Jensen Huang) also clarified for the first time AMD's future product rhythm for GPUs - iterating a new GPU product every year, in line with Nvidia's update rhythm.

Almost every time Nvidia launches a GPU, AMD will buy it from the market and disassemble it as soon as possible to compare it with products that have not yet been launched.

"We need to add some features (indicators) here and raise the parameters there." What they pursue is, "We cannot lag behind NVIDIA in hardware and have a slight advantage in parameters," the aforementioned person told 36Kr.


Su Zifeng released Instinct MI325X at this year's Taipei Computer Show

Since 2023, AMD's Chinese ecosystem partners have received new software optimization requests from AMD almost every two days. In order to promote their GPUs, AMD executives sometimes ask the more advantageous CPU department to place orders with GPUs, taking the risk that the CPU may not be sold.

AMD people are praying to us every day, hoping that we can build an ecosystem.", said an executive of an ecological company. According to him, at presentThere are already more than 10 cloud vendors and To B customers in China, in the adaptation and effect verification related to AMD chips.

Compared with a number of anxious domestic chip manufacturers, the advantage of foreign chip giants at the hardware level lies in their advanced process and HBM production capacity. Therefore, the products of AMD and Intel are not much different from those of Nvidia, and are even better to some extent.

Official data shows that AMD's product (MI300X released in December 2023) previously claimed to have 1.2 times the computing power of Nvidia's H100;

Intel's product (Gaudi 3, released in April 2024) also far exceeds H100 in terms of energy efficiency and inference performance. Of course, it is also cheaper. The price of AMD's GPU is about 70% to 80% of Nvidia's benchmark product.

But all manufacturers who are competing head-on with Nvidia face a common problem:No matter how advantageous our hardware is, it will be dimmed by our software disadvantages, just like the weak board of a barrel.

In the era when GPUs could only be used for graphics computing, CUDA, a software platform launched by NVIDIA, provided developers with a set of programming interfaces, allowing them to easily write computing programs on GPUs using the programming language they were most familiar with.

"Let me give you an example. Why CUDA cannot surpass others is like you learn a language and you have been working with this language for many years. If I ask you to change the language, will you feel uncomfortable and unwilling?" An employee of a chip company gave this example to 36Kr.

CUDA is the deepest barrier of NVIDIA's software ecosystem.Even Intel and AMD, which are big and powerful, cannot overtake others in a short period of time.

A former employee of Intel's GPU team told 36Kr that they had deployed more than 3,000 engineers worldwide and invested three to four years, but only improved the accuracy from 0% to 4% - they used Intel's chip to convert a portrait, waited for a long time, and the information was lost to the point that "it could no longer be seen as a human face."


Intel CEO Kissinger releases Gaudi series chips

The fallacy of "Which came first, the chicken or the egg" reappears. It is precisely because few people use AMD and Intel's GPUs, and even fewer people use their corresponding software platforms (ROCm, oneAPI), that it is difficult for anyone to fully utilize their true hardware capabilities.

"NVIDIA's CUDA has always had so many developers iterating algorithms on it, helping NVIDIA to make reasoning and training very efficient, which has led to NVIDIA always having bargaining power and always knowing what to do with its next chip. But this is a headache for both AMD and Intel," said the CEO of an AMD ecosystem company.AMD's software tool ROCm is "like Nvidia's CUDA 20 years ago."

But for downstream customers, this is where the risks arise.

Verifying a large model is an experiment with uncertainty. If it is run on an unverified chip, it is equivalent to putting two uncontrollable variables together. Abandoning NVIDIA means paying huge migration costs and carrying uncertainty.

Despite this, the encirclement and suppression of Nvidia is still a battle that AMD and Intel have to fight.

The global chip architecture is divided into three parts: the X86 architecture guides the PC field, which is dominated by Intel and AMD; the mobile market is dominated by Arm; and Nvidia dominates the artificial intelligence market.

In the nearly one and a half years since the new AI revolution began, Nvidia has surpassed the $3 trillion market value mark, which is now equivalent to the combined market value of seven Intel and AMD combined.

After 20 years, the chip giants are "encircling and suppressing" Nvidia.It was another tense battle, and also a belated counterattack.

The real crack

When domestic AI chip companies formed an army of ants and AMD and Intel went all out, was Nvidia, which was faced with such siege and blockade, really shaken?

The cracks in the Nvidia empire are quietly spreading.

A signal that Nvidia must be vigilant about is that OpenAI, Google, Microsoft... these big customers who have recharged Nvidia because of their belief in AI are taking the first step towards "anti-Nvidia".

Self-developed chips are a long-planned move for all companies. A former core employee of Google’s TPU team told 36Kr that Google, which uses a quarter of the world’s computing power,"We may not purchase chips from outside by the end of the year."

In the past, Google's self-developed TPU was more based on cost considerations, such as concerns about Nvidia's arbitrary price increases or unstable supply. Now Google's chip-making strategy is more radical."Almost regardless of the price and cost investment."

OpenAI has made countless preparations. They plan to raise up to $7 trillion to build a new AI chip empire.

in the country, 36Kr has also learned from many sources that:Currently, Nvidia’s largest buyers in China, Alibaba, ByteDance, and Baidu, are basically secretly researching chips for large model training.

Product progress of overseas cloud vendors, large model vendors, and star chip vendors, 36氪 comprehensive information collation and mapping

However, developing one's own chips is a long-term strategy after all. Another short-term option for these big customers is to try the products of Nvidia's competitors and reduce their dependence on Nvidia.

AMD is Plan B. An AMD insider told 36Kr,AMD's GPU products have opened up large customer markets in Europe, the United States, South Korea and other places.Microsoft has purchased tens of thousands of AMD products, and Tesla, Midjourney, the U.S. National Laboratory, and Korea Telecom have also taken delivery in bulk.

In China, Chen Wen, an employee of AMD Ecosystem Company, said that several hundred pieces of a certain model of AMD accelerator cards will be shipped in 2023. Although not a large number, "this product of AMD has almost never been found in China before."

According to AMD's previous optimistic forecast, by the end of 2024, data center GPUs will bring AMD up to US$2 billion in revenue.

Although the army of Chinese chip manufacturers has not yet posed a substantial threat to Nvidia, they are gradually becoming a spark.

36Kr learned that the sales of domestic training and inference chips have now leapt into a new level - a gratifying positive signal is that domestic Internet companies and large model companies, which are generally recognized as the most difficult to win orders from, have opened up to domestic chip manufacturers.

According to 36Kr, currently,Ascend chips have already made their way into the systems of Internet companies including Baidu.

In addition, domestic AI companies such as Zhipu AI, MiniMax, and Jieyuexingchen are all training large models with trillions of parameters. However, in the case of limited high-end NVIDIA chips, large model companies generally choose "mixed training" (that is, NVIDIA + other chips), for example,Zhipu AI’s cluster reserves nearly half of the Ascend chips.

In addition, since last year, both Tianshu Zhixin and Suiyuan's inference chip products have shipped tens of thousands of pieces, and the shipping channels include major domestic intelligent computing centers. The former has entered the supply chain of the large model manufacturer Baichuan; on the Baidu Kunlun core side, the cumulative shipments of the past two generations of inference chips are 30,000 to 50,000 pieces, and the shipments from Baidu and external channels have each accounted for half.

"Nvidia's current price and supply levels are on the brink of testing whether or not everyone will do it and how to do it.", an industry insider said bluntly.


Nvidia founder Huang Renxun Image source: Visual China

If we look further ahead to the next three to five years, new threats from Nvidia are gradually emerging.

New AI chip architectures other than GPUs have also emerged in the industry. For example, Silicon Valley chip company Groq, which previously launched the LPU architecture, claims to run large language models "ten times faster" than Nvidia's GPU.

There is also Silicon Valley chip Etched, which released a large-scale ASIC chip, claiming to be "an order of magnitude faster" than Nvidia's GPU. Behind these chip startups are star investors such as OpenAI.

36Kr learned that new AI chip startups have also emerged in China this year. For example, Shanghai has recently secretly supported two new AI chip companies.

Yang Gongyifan, CEO of Zhonghao Xinying, a domestic TPU company, said that the overall transistor utilization of GPU is only 20%, which is actually a very obvious defect. In comparison, although new architectures such as TPU and ASIC are not very universal, their transistor utilization can reach 60%-100%. "In the next three to five years, there will definitely be a lot of AI chips other than GPU architectures at home and abroad."

A few tiny bits are enough to shake the giant like Nvidia.

"Do you think Nvidia is as unrivaled as it seems? It's not true," an Nvidia employee told 36Kr. As Huang Renxun often said, "We are only 30 days away from bankruptcy."

Nvidia has been preparing for more than a decade, and it was only when it encountered a genius company like OpenAI that the miracle of Nvidia came into being. In the past, the semiconductor industry was not short of stories of giants overtaking others.

The smoke of the battle to encircle and suppress Nvidia has already begun.

end

end