news

nvidia's "root cause of decline": cutting-edge chips, the stronger the performance, the more difficult it is to manufacture

2024-08-31

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

author | gao zhimou

editor | hard ai

if we use one sentence to summarize the "root cause of nvidia's decline", it is cutting-edge chips. the more powerful the performance, the more difficult it is to manufacture.

nvidia shares fell 6.4% on thursday after it reported strong quarterly sales and profits on wednesday, but also noted that manufacturing challenges with new chips had led to lower profit margins and that the company took a $908 million provision in the latest quarter.

the company admitted in a statement that the blackwell architecture gpu has yield issues and needs to redesign part of the b200 processor design to improve the yield, so the mass production time of the next-generation blackwell architecture gpu will be postponed to the fourth quarter of 2024:

“we have adjusted the design of the blackwell gpu to improve production yield. blackwell production is scheduled to start in the fourth quarter and continue until fiscal 2026.

we expect blackwell products to generate billions of dollars in revenue in the fourth quarter.”

nvidia did not elaborate on the exact cause of the problem, but analysts and industry executives believe that the engineering challenges mainly stem from the complex manufacturing process caused by the design of the blackwell chip.

the analysis pointed out that blackwell's huge size and complex design have brought unprecedented manufacturing complexity. a defect in any component may cause the chip to be scrapped, thus affecting the yield rate and profit. in addition, the difference in thermal expansion coefficients of various parts of the chip may also cause the package to warp, affecting performance and reliability.

to improve the yield rate, nvidia has adjusted the blackwell design and plans to increase production as planned. however, analysts believe that the complexity of using tsmc's new chip connection technology and the inherent challenges brought by chip size will still be the main obstacles to blackwell's mass production.

g. dan hutcheson, vice president of industry analyst firm techinsights, said:

"the problem is getting the chips to work together and have good yields. when the yields of individual parts of the chip aren't high enough, everything can go bad very quickly."

01

the complexity of the blackwell chip

in order to maintain its leading position in the field of artificial intelligence chips, nvidia (nvda) hopes to "bigger is better". however, while the larger size brings stronger performance, it also brings greater manufacturing difficulties.

nvidia's latest ai chip blackwell is described by huang renxun as a "very, very large gpu". in a physical sense, it is indeed the largest gpu currently available. it is made up of two blackwell dies spliced ​​together, using tsmc's 4nm process and has 208 billion transistors - 2.6 times that of its predecessor.

ubs analysts said in a report earlier this month that nvidia's main problem with blackwell is that the new cowos-l packaging method of tsmc is too complicated.

semianalysis, a professional semiconductor industry media, reported that the packaging technology uses an rdl interposer with local silicon interconnect (lsi) bridges to connect the core particles, with a transmission rate of about 10 tb/s. the placement accuracy of these bridges is extremely high - a defect in any component may cause the entire chip worth $40,000 to be scrapped, thus affecting the yield and profit.

in addition, due to the mismatch of thermal expansion coefficient (cte) between gpu core, lsi bridge, rdl interposer and motherboard substrate, chip warping and system failure occurred. according to reports, nvidia had to redesign the top metal layer and bumps of gpu chips to improve yield.

huang renxun emphasized in a conference call with analysts that the blackwell chip does not require any "functional changes" and all adjustments are aimed at improving yield.

chief financial officer colette kress said nvidia is on track to increase blackwell production and expects blackwell to bring the company billions of dollars in revenue in the fourth quarter.

02

micron adds dram expansion plan

according to japanese media reports, micron plans to build a new dram chip production plant in hiroshima prefecture, japan, with the goal of putting it into operation as early as the end of 2027.

such problems are not unique to nvidia. industry insiders say they are increasing as chipmakers look to increase processing power by increasing chip size. chip design changes to eliminate defects or improve yields are also common in the industry.

lisa su, ceo of chip giant amd, also pointed out that as chip size continues to increase, manufacturing complexity will inevitably increase.the next generation of chips will need to achieve breakthroughs in energy efficiency and power consumption to meet the huge demand for computing power in artificial intelligence data centers.

“it’s going to take a lot of technical investment to make these technologies work,” she said. “are they going to get more complex and bigger? no doubt about it. that’s our reality.”

of course, in order to break through the size limitation of a single chip, nvidia combined two chips of the largest size to create blackwell. this radical strategy has also attracted doubts from competitors.

andrew feldman, founder of rival cerebras systems, believes that the difficulty of developing multi-chip combination technology will increase exponentially. cerebras systems chose to develop a giant single chip and launched an artificial intelligence cloud computing service based on it in an attempt to challenge nvidia's market position.

andrew feldman said:

“doing meaningful work in the field of artificial intelligence requires a lot of computing power, which requires a lot of transistors, more than can fit on a single chip…

it’s hard to develop two-chip technology, it’s harder to develop four-chip technology, and it’s even harder to develop eight-chip technology.”

whether nvidia's giant chip strategy will ultimately win remains to be tested by the market. but what is certain is that the ultimate challenge of chip manufacturing has just begun.