news

Silicon Valley giants make chips, and the "entry ticket" is $2 billion

2024-08-01

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

"Core Matters" is Tencent Technology's semiconductor industry research project. This issue focuses on the logic and challenges of technology companies developing their own AI chips.

Author: Xinchao IC Ah Niu

Edited by Su Yang

“Our Nvidia chip reserves can no longer keep up,” admitted the CEO of a large AI model company.

Due to the further tightening of export controls, it is difficult to purchase China-specific chips such as A800 and H800. Instead, the compliant version H20 has been replaced. The performance of the latter has been greatly reduced, and the outside world also calls it a castrated version. Even so, H20 may still face export controls in the update of the export control terms of the US Department of Commerce in October this year.

The Financial Times quoted two anonymous sources close to Nvidia as saying that several Chinese companies have placed orders for chips worth a total of $5 billion from Nvidia. At the same time, some domestic chips have entered the field of vision of major technology companies, but due to reasons such as process and interconnection, there is still a gap in performance and supply is also a challenge.

In this context, many large companies have started their own research and development, and have successively started production at TSMC, covering multiple process nodes such as 5nm and 7nm, to ensure the security of their own AI chip supply.

Export controls are like a double-edged sword, which has strangled the computing power and limited Nvidia's growth. Especially in the context of customers' self-development, Nvidia's revenue in the mainland has begun to fluctuate. In fiscal 2022, Chinese customers contributed 25% of Nvidia's market revenue, but by fiscal 2024, this proportion has dropped to single digits.

For Nvidia, the pie in mainland China is shrinking, and Silicon Valley is also changing. Major customers such as Google, Apple, Meta, Amazon, and Tesla are trying to do both, using Nvidia chips while developing their own.

What is the logic behind the self-developed chips of major Chinese and American manufacturers? Can the technology companies that have poured into the red ocean successfully land by relying on self-developed chips?

01 Hard currency gives you a sense of security when you hold it in your hands

At a time when big models and generative AI are sweeping the world, computing chips are the hard currency in the hands of technology companies. Xinchao IC once tracked the madness and tension of "grabbing computing power" in the article "Sky-high-priced H100 flows to the black market".

Holding the cards in one's own hand is the fundamental reason why large manufacturers develop their own chips.

In the view of Chen Wei, chairman of Qianxin Technology, large manufacturers have three trump cards: ensuring supply, reducing costs and bidding, which generally speaking is a kind of chip autonomy.

For many large Chinese companies, especially those in the Internet and artificial intelligence fields, they are facing the risk of a computing power cut-off at any time under the background of advanced chip export controls, and self-development is the guarantee of computing power security. However, the chips developed by each company are mainly for internal use, and the specifications are also customized for their own products, not general-purpose products.

For the giants who "burn money" to increase computing power, self-development is a way to reduce costs. "Only when the scale is large enough and the demand is large enough can we consider self-development, otherwise it may not really reduce costs." said Wu Zihao, a former TSMC factory construction expert.

In 2021, Musk launched the Dojo supercomputer built with his own AI chip D1 to train Tesla’sAutopilotAccording to the latest research report from Morgan Stanley, this system saves $6.5 billion compared to using Nvidia's A100.

With the rise of AI demand, cloud manufacturers' dependence on GPUs far exceeds that on CPUs. The demand for Nvidia chips is crazy, and self-research is also a bargaining chip for cloud manufacturers to grab Nvidia orders.

A person close to Amazon told Xinchao IC that Nvidia's cards are not cheap. If large DSA manufacturers have their own dedicated chips, they can not only fully reduce the average cost of chips and patents, but also have better bargaining power with Nvidia.

Public information shows that Amazon not only designs its own computing servers, storage servers, and routers, but also develops its own server chip Graviton.

AWS launches general purpose Graviton4 processor

According to The Information, Amazon continues to lower prices by replacing Nvidia with Graviton, and customers who rent Graviton servers directly save 10% to 40% of computing costs. From Nvidia's perspective, if it wants to retain Amazon, the world's largest cloud factory customer, it has to sit at the poker table and negotiate a better price.

"This kind of profit-sharing may not always be fully reflected in the discount, but may be reflected in the configuration."

The above-mentioned person familiar with the matter revealed that as the world's top spot manufacturer, if Nvidia directly gives very intuitive discounts on unit prices, it will have a great negative impact on the product pricing system and will be detrimental to product price protection. However, they can provide discounts to large customers in disguised form by upgrading interconnection equipment, upgrading SSD storage, adding more Rack configurations, etc.

There is also a more common preferential method - production capacity tilt and providing first-launch rights.

With the first-mover advantage, Amazon can set the price of the whole machine higher in the short term and "save" the (discount) money from the premium of the whole machine and the circulation of the supporting software tool chain.

In addition to ensuring supply, reducing costs and bidding, some large manufacturers develop their own chips more to ensure their own unique competitiveness.

Chen Jing, vice president of the Fengyun Society, mentioned that the chips sold by NVIDIA are suitable for general computing and have comprehensive functions but are also relatively expensive. However, some customers only need specific functions to enhance their own advantages. In this case, they will consider self-research.

"I only need to do large model inference and don't need training functions. In this case, I can design a dedicated chip with simple functions but faster speed and lower price," Chen Jing said. "Large companies like Google and Microsoft have their own software and hardware system specifications. Nvidia may not be able to meet the standards on how to control noise and what level of energy consumption should be. It is more convenient to design it by ourselves."

Among the Silicon Valley giants, Google is very concerned about the differentiation of its own architecture, cost and chip technology. Since 2016, it has developed its own AI tensor processing unit (TPU) to achieve better cost-effectiveness and performance in large and medium-sized training and reasoning to ensure its owncloud computingProducts have better uniqueness and recognition.

Google launches sixth-generation TPU "Trillium"

According to the data related to the 4th generation TPU disclosed by Google, compared with systems of the same size, the TPU v4 is 1.7 times more efficient than NVIDIA A100 and 1.9 times more energy-efficient.

In addition to the above points, from an ecological perspective, there is a deeper reason - breaking the CUDA monopoly. As a programming language developed by NVIDIA, CUDA is the main reason why GPU prices have skyrocketed, and customers have to accept it.

If cloud manufacturers do not conduct their own research and development, even if they can get good order prices, more than 95% of the processors in data centers will still use NVIDIA GPUs, and the entire cloud AI demand will still rely on the CUDA ecosystem. In the final analysis, the key to success is still in the hands of NVIDIA.

As cloud manufacturers equip data centers with self-developed chips and develop a lot of underlying middleware and binary translation functions to help customers migrate to their own ecosystems, the degree of equivalent compatibility with CUDA programs will be higher and the degree of exclusive dependence will gradually decrease.

"This is what all cloud companies are doing. Even though processors may account for less than 4% of the entire data center, they still have to insist on doing this," said the person familiar with the matter.

02 If you have people and money, then go ahead

"If you have people, money, and things to do, and it has future volume, then you can get it done."

 

Yu Hao, vice president of Lenovo Holdings, believes that it is logical for large companies to develop their own chips because their customers are there, which is a clear advantage.

"The 'people' have to be 'experts' with practical experience in the entire life cycle of chips, and the 'money' has to be 'active money' that continuously contributes to revenue through computing power business. In this way, large manufacturers can rely on the closed loop of AI business, take stock of the foreseeable market growth in the future, quantify computing power demand, complete the strategic coordination of people and money, and develop their own chips naturally." Yu Hao told Xinchao IC.

However, the entry fee to sit at the table of self-development is at least 2 billion US dollars.OpenAI There were even rumors that CEO Sam Altman had a crazy plan to raise $7 trillion to build chips, although the person involved later denied the rumor.

According to an insider, "For each company that iterates its first-generation product, if we calculate it based on the 7-nanometer intermediate node, adding mass production will cost at least $2 billion."

In addition, self-developed chips are mostly for self-use, so the difficulty of building an ecosystem can be ignored. Ranciyuan quoted the views of a partner of Tianying Capital, saying: "Specialized chips generally do not need to be particularly complex in terms of architecture design, the application characteristics are also very clear, and the development is relatively fast. Therefore, for many Internet companies, they develop specialized chips because they have clear scenarios and do not need to spend too much financial resources and time to get the software ecosystem, and the process IP is also mature."

The theory works, but how can we actually carry out independent research and development?

According to industry knowledge, chip self-development can be divided into two parts: front-end and back-end. The front-end is the logic design, which is the most basic functions of the chip, and the back-end is the physical design, which is the entity that implements all functions into the circuit and tapes out.

Schematic diagram of self-developed chip process

Generally speaking, the only designs that small teams can independently complete are the front-end and back-end of the logic chip and the software tool chain itself. Even so, many designs only have 5% self-developed dedicated circuits.

The aforementioned insider said, "Everyone in the market does 1/5 of the work themselves, and leaves the remaining 4/5 to others. This is a fairly mature ecosystem. As for how to obtain these circuit IPs, some are bad channels, and some are compliant channels, such as authorization from IP design manufacturers such as ARM."

For many teams, they actually have the ability to independently design some circuits, but because they need to overcome IP with very strict intellectual property protection, even if they do succeed, it is very likely to violate the computer architecture and cannot be used, or even infringe on the IP of others. This is also the first hurdle that domestic manufacturers encounter when developing their own chips.

There are also some things that the in-house research team cannot design, such as some very difficult structures of NOC (Network on Chip).

Design is only one link in the self-developed chip, which also includes tape-out, mass production, etc. Various problems may be encountered in the middle, including tape-out failure, mass production capacity, etc., but these are not the end of self-developed chips. It also needs to solve a series of supporting problems, including how to implement the integration of the industrial chain.

From the outside, a dedicated logic chip is about 500 mm², and a general-purpose GPU can be 800 mm², containing tens of billions or even hundreds of billions of transistors. Some of its functions are used for vector calculations, which can be considered as the entirety of a vector processor. However, to be implemented in application scenarios, it also requires the design and implementation of storage, energy consumption control, power supply, and overall machine operating conditions, and also requires interconnection and networking to become a larger cluster.

In addition, the ultimate goal of the product is to achieve differentiation, which needs to be done on the peripheral interface and the entire machine, which will result in different SKUs. The industry usually compares storage, energy consumption, and efficiency per square meter from the perspective of the entire machine.

In other words, in addition to industry chain integration and IP issues, self-developed chips also have to consider product SKUs, rather than just designing a logic chip.

A senior person who wishes to remain anonymous told Xinchao IC, "Many domestic companies do not have product design capabilities. After making chips, they have to go to various places for testing. Their engineers and business BDs are stationed in the data centers of large manufacturers such as Inspur and Sugon every day, hoping that the other party will reserve a socket for them on the new server motherboard. If the test is successful, they will buy a batch. However, there are indeed very few successful ones at present."

Mass production is another challenge besides design, tape-out and productization. Small teams must consider whether they can reserve production capacity.

"Output is always a very critical number for a Fab." Wu Zihao, a former TSMC plant construction expert, said that manufacturers must grasp at which link they enter and how much output they promise, which is the most critical point to impress the Fab.

First-tier manufacturers have basically already booked the latest production capacity of the Fab plant during the DTCO (design technology co-optimization) stage. Currently, the world's top design teams, such as ARM, have a large number of people stationed in TSMC every year, including many EDA manufacturers.

DTC determines the performance indicators of the next-generation processors at specific nodes. For example, how much money can be saved and how much performance can be improved through reasonable layout design at the 3nm node.

"Nvidia and Apple are always willing to try out the most advanced processes. As long as TSMC comes up with a most advanced process, even if the yield is unknown and the performance gain is unknown, that is, the economic model cannot be calculated, they will already reserve production capacity and conduct DTCO collaborative design with the Fab factory. This is the fundamental reason why first-tier manufacturers can obtain production capacity." said Wu Zihao.

If collaborative design is not done from the beginning, the fab and the fabless team, both parties A and B, will have to switch roles. Since no one dares to use it, the fab can only promote its new process little by little, starting with the chip with the lowest risk.

For example, mining machine chips have a very simple structure and a very small size, which makes them very suitable for initial trials. The first customer of Samsung's 3nm chip was a mining machine equipment manufacturer from China.

Wu Zihao said, "After the fab factory successfully tests the waters with small customer orders, it can then try to mass-produce PC CPUs, mobile phone chips, and finally AI chips, step by step."

At present, domestic large companies have invested at least billions of dollars in self-developed chips. If a startup wants to join the market, it must either have enough early customers, or an application platform to support chip adaptation and trial and error, or have sufficient capital or strong financing capabilities. One of these two conditions must be met.

Chen Wei told Xinchao IC that if it is a commercial company without low-cost human resources and does not rely on schools or research institutes, it will need no less than RMB 500 million to 1 billion in financing before mass production; but if there is support from a research institute or other sources, labor costs can be reduced a lot and the threshold amount can be slightly lower.

“If there is not so much money, but the startup has strong cost control capabilities and can make full use of upstream and downstream collaboration to reduce costs, that’s fine. Everything is done to ensure the continued development of products.”

In terms of Fab selection priority, according to people familiar with the matter, startups will initially choose to order TSMC's production capacity, followed by GlobalFoundries, but GlobalFoundries does not have advanced processes or packaging, so they will turn to SMIC, but the production capacity that SMIC can book is scheduled until the year after next.

Investment institutions take a longer-term perspective on this situation.

Yu Hao said that for domestically developed high-end chips, it is inevitable to face tape-out bottlenecks in the short term. In the long term, it depends on the evolution speed of advanced process lines and capacity expansion represented by SMIC, which mainly relies on internal circulation. However, manufacturers with high-end chip design capabilities may consider going overseas and using external circulation to drive internal circulation. Going overseas will open up new opportunities.

03 It’s not easy to be Nvidia’s “gravedigger”

When old customers set up their own businesses, NVIDIA is always the one that gets hurt.

This global self-developed wave is more and more "explosive". In addition to the mature self-developed results such as Google TPU and Amazon Graviton. Recently, the entire technology circle has been flooded with "the world's first Transformer-specific AI chip Sohu".

US chip startup Etched launches Sohu, a Transformer-specific AI chip

This chip directly embeds the Transformer architecture into the chip, and its reasoning performance far exceeds that of GPU and other general-purpose artificial intelligence chips. It is said to be 10 times higher than the top chip B200 that was only released in March this year. It is said that a server equipped with 8 Sohu chips has performance comparable to a cluster of 160 H100 GPUs and can process up to 500,000 Llama 7B tokens per second.

The “hotshot” suddenly appeared, and the Fab factories and partners were overjoyed.

It is reported that the company has directly cooperated with TSMC's 4nm process on the production of Sohu chips, and has obtained sufficient HBM and server supplies from top suppliers. Some early customers have already booked tens of millions of dollars of hardware from the company. Netizens have nicknamed Etched "Nvidia's gravedigger."

But will Nvidia's "self-developed" myth really be shattered by the smoke of war? Actually, not really.

In the semiconductor industry, there is a famous "Makumoto cycle" - chip types regularly evolve alternately between generalization and customization - in a certain period of time, general structures sell the best and are popular among users, but after reaching a certain stage, general structures lag behind in meeting specific needs, and specialized structures will become popular.

NVIDIA is the undisputed representative of the era of general-purpose architecture, which is currently at its peak.

According to Wells Fargo statistics, NVIDIA currently has a 98% share of the global data center AI acceleration market, and is in an absolute dominant position. This means that 98% of the world's population is using NVIDIA's CUDA C to "drain" the performance of all GPUs, and only the remaining 2%-3% of people are still insisting on using a not-so-good "hammer" to hammer the same "nail".

"Now, whether it is Amazon or Intel, the processors they make themselves cannot economically satisfy the maximization of the interests of a cloud vendor, so they will definitely continue to use Nvidia chips in large quantities until one day Nvidia completely loses its advantage, and then they will go the path of specialization in the Makumoto cycle." The aforementioned person familiar with the matter said.

However, lying flat does not fit Nvidia's personality. Huang Renxun is a person who knows that "born in hardship and died in comfort". In a speech at National Taiwan University a year ago, he talked about this: "Whether you run for food or run to not be regarded as food by others, you often don't know which situation you are in, but no matter what, you have to keep running."

This time, facing the challenge of self-research in the entire Silicon Valley, Nvidia is also playing its cards.

The aforementioned senior person told Xinchao IC, "NVIDIA has long ceased to be a general-purpose GPU. In its GPU units, you can see a large number of Tensor Cores to solve matrix calculations. In addition, you can also see Transformer engines, sparse engines, etc. Whether in hardware structure or in the update of hardware operators, NVIDIA is pushing itself towards the DSA design trend every year."

DSA (Domain Specific Architecture) is a programmable processor architecture customized for a specific domain, which can optimize the performance and cost-effectiveness of specific applications. Currently, Google, Tesla, OpenAI and Cerebras have launched their own DSA chips for specific applications.

According to insiders, all DSA manufacturers will find that even if Nvidia does not modify the hardware, the general-purpose GPU only updates one operator, and the advantages of DSA manufacturers are completely gone. It seems that compared with Nvidia, not only is the area of ​​their logic chips not large enough, the device capacity and speed are not large enough, the computing power is not as good as Nvidia, and the software adaptability is not good enough. This is also a problem faced by all DSA manufacturers - the Makumoto cycle was originally going to move towards DSA specific domain architecture and customization, but now it has returned to the era of general-purpose processors.

In addition to "looking in the mirror" with DSA manufacturers, NVIDIA has also extended an olive branch to its own research and development. At the beginning of 2024, it established a new business unit led by semiconductor veteran Dina McKinney to build customized chips for customers in cloud computing, 5G telecommunications, gaming, automotive and other fields.

Reuters quoted people familiar with the matter as saying that Nvidia executives have met with representatives of Amazon, Meta, Microsoft, Google and OpenAI to discuss customizing chips for them. In addition to data center chips, the company is also vying for telecommunications, automotive and video game customers.

Previously, there were reports that the new version of the Nintendo Switch game console to be launched by Nintendo this year will most likely be equipped with Nvidia's customized chip. Nvidia also has deep experience in the handheld game console market and has launched the Tegra series of mobile chips, although this chip series did not ultimately gain a foothold in the mobile device market.

In a market dominated by price/performance, it is not easy to be Nvidia’s gravedigger. Most of the previous gravediggers failed and were eventually acquired, such as Intel and Google, which acquired many startups, but most other companies died before they could even be acquired.

Perhaps it is easier for startups to succeed if they change their perspective.

"For example, don't be obsessed with the AI ​​processor itself. Instead of spending a lot of time to realize the ideal of a DSA, it is better to consider system-level solutions. For example, you can make peripherals to provide services for the AI ​​processor. Professional storage and professional sensors can also achieve the same purpose." said the aforementioned person familiar with the matter.

In 2019, Nvidia announced the acquisition of Mellanox for US$6.9 billion. This valuation was so high that Nvidia almost overdrew its entire cash flow.

The company neither develops ports nor photovoltaic modules, nor does it develop switches themselves. It only makes one product - the high-speed interconnection PHY "InfiniBand" between the bottom layer of switches and the bottom layer of communications. For NVIDIA, which was in urgent need of breaking through the limitations of server interconnection at the time, this was a very core peripheral requirement. No matter how well NVLink was made at the time, it was always locked near a single machine. But InfiniBand can allow switches to break through the interconnection bottleneck between servers and interconnect all GPUs into a large cluster.

At present, China is in a hot state of preparation, and the bullet of self-developed chips will have to fly for a while. Chen Wei believes that preparing is not a bad thing, but it should be a high-end preparation to avoid missing the key window period for industrial development.

Zhou Jiangong, founder of Weijin Research, further elaborated on this view. He believes that the demand for professional, customized, and miniaturized AI chips in future applications will exceed the demand for cutting-edge basic large models. Training open source, smaller models at a lower cost, or fine-tuning and distilling large models, as well as reasoning, all bring broad space for self-developed chips. Moreover, in the rapid changes and immaturity of new technologies, there will be a large number of self-developed opportunities around the above applications.

“Don’t do what will end Nvidia, but do more than Nvidia.”