2024-10-03
한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina
will nvidia’s “gpu festival” end?
since the release of chatgpt by open ai in the united states on november 30, 2022, generative ai (artificial intelligence) has become a major craze, and nvidia’s gpus have become popular as ai semiconductors. however, in the production of gpus, there are two bottlenecks: tsmc’s mid-range process and high-bandwidth memory (hbm) stacked with dram, leading to a global shortage of gpus. “is the bottleneck the mid-range process between hbm and tsmc?”
among these gpus, the "h100" was in particularly high demand, with its price soaring to $40,000, triggering the so-called nvidia "gpu festival."
under this circumstance, tsmc doubled its mid-process interposer production capacity, and dram manufacturers such as sk hynix increased hbm production, resulting in the "h100" lead time being shortened from 52 weeks to 20 weeks.
so, will nvidia’s “gpu festival” end?
so, in this article, we'll discuss whether nvidia's "gpu day" is coming to an end. let’s talk about the conclusion first. it is expected that even by 2024, only 3.9% of the high-end ai servers (definition will be explained later) required for chatgpt-level ai development and operation will be shipped. therefore, it seems that the needs of cloud service providers (csps) such as google, amazon, and microsoft cannot be met at all. in short, so far, nvidia's "gpu festival" is just the beginning, and a comprehensive generative ai boom is coming.
next, let’s briefly review the two major bottlenecks of nvidia gpu.
two nvidia gpu bottlenecks
in the production of nvidia gpus, the foundry tsmc is responsible for all front, middle and back processes. here, the intermediate process refers to the process of producing gpu, cpu, hbm and other chips separately and placing them on a square substrate cut from a 12-inch silicon wafer. this substrate is called a silicon interposer (figure 1).
figure 1 intermediate processes emerging from 2.5d to 3d, such as nvidia gpu (source: tadashi kamewada)
in addition, the nvidia gpu package developed by tsmc is called cowos (chip on wafer on substrate), but the two bottlenecks are silicon interposer capacity and hbm (figure 2). the situation is as follows.
figure 2 cowos structure and two bottlenecks on nvidia gpu (source: wikichip)
cowos was developed in 2011, but since then, as gpu performance has improved, the size of gpu chips has continued to increase, and the number of hbms installed in the gpu has also increased (figure 3). as a result, silicon interposers are getting larger every year, while the number of interposers available on a single wafer is decreasing in inverse proportion.
figure 3 interposer area and hbm number increase with each generation (source: kc yee (tsmc))
in addition, the number of hbm installed in the gpu increases, and the number of dram chips stacked inside the hbm also increases. in addition, dram is miniaturized every two years, and the hbm standard is updated every two years to improve performance. therefore, cutting-edge hbm is in short supply.
under this scenario, tsmc will double its silicon interposer production capacity from 15,000 wafers per month around the summer of 2023 to more than 30,000 wafers per month around the summer of this year. in addition, samsung electronics and micron technology have obtained nvidia certification and begun supplying cutting-edge hbm, which was previously dominated by sk hynix.
affected by the above, the delivery time of nvidia h100, which has the highest demand, has been significantly shortened from 52 weeks to 20 weeks. so, how much has ai server shipments increased as a result?
definition of two types of ai servers
according to the "global annual server shipments, 2023-2024" (servers report database, 2024) released by digitimes research, there are two types of ai servers:
systems equipped with two or more ai accelerators but not hbm are called "universal ai servers."
systems equipped with at least four hbm-powered ai accelerators are called "high-end ai servers."
the ai accelerator here refers to special hardware designed to accelerate ai applications, especially neural networks and machine learning. a typical example is nvidia's gpu. in addition, the development and operation of chatgpt-level generative ai requires a large number of high-end ai servers rather than general-purpose ai servers.
so, what are the shipment volumes of general ai servers and high-end ai servers?
general ai server and high-end ai server shipments
figure 4 shows the shipments of general ai servers and high-end ai servers from 2022 to 2023. general ai server shipments are expected to be 344,000 units in 2022, 470,000 units in 2023, and 725,000 units in 2024.
figure 4 general ai server and high-end ai server shipments (2022-2024) (source: digitimes research)
at the same time, high-end ai servers required for the development and operation of chatgpt-level generative ai are expected to ship 34,000 units in 2022, 200,000 units in 2023, and 564,000 units in 2024.
so, can the shipments of high-end ai servers meet the needs of us csps?
figure 5 shows the shipment numbers of servers, general ai servers, and high-end ai servers. when i drew this diagram and looked at it, i was stunned and wondered "is this how many high-end ai servers are being shipped?" this is because, looking at servers as a whole, whether they are general-purpose ai servers it is still a high-end artificial intelligence server, and the shipments are very small.
figure 5 shipments of servers, general ai servers, and high-end ai servers
source: author based on mic and digitimes
i was even more disappointed when i looked into how many high-end ai servers would be needed to develop and run chatgpt-level generative ai.
high-end ai server required to generate ai at chatgpt level
it is reported that the development and operation of chatgpt requires 30,000 nvidia dgx h100 high-end ai servers (figure 6). when i saw this number of thirty thousand units, i felt dizzy.
figure 6 how many high-end ai servers are needed to run chatgpt? (source: hpc website)
by the way, the "nvidia dgx h100" is equipped with eight "h100" chips, and the price of each chip has soared to $40,000, bringing the total system price to $460,000. in other words, generating chatgpt-level ai requires an investment of 30,000 units x $460,000 = $13.8 billion (approximately 2 trillion yen based on $1 = 145 yen!).
i think the world is full of generative ai systems, but how many chatgpt-like generative ais have actually been (or will be) built? (figure 7)
figure 7 server shipments, high-end ai server shipments, and the number of chatgpt-level generated ai systems (source: mic and digitimes)
since the shipment volume of high-end ai servers in 2022 will be 34,000 units, only one chatgpt-level ai system can be built (this is chatgpt). the following year, in 2023, high-end ai server shipments will reach 200,000 units, so 6 to 7 chatgpt-level ai systems can be built. since 564,000 high-end ai servers are expected to be shipped in 2024, it will be possible to build 18 to 19 chatgpt-level ai systems.
however, the above estimate assumes that chatgpt-level ai can be built with 30,000 high-end ai servers "nvidia dgx h100".however, as a generation of ai is likely to become more complex, more than 30,000 nvidia dgx h100s may be needed in this case. all things considered, u.s. communications service providers are unlikely to be satisfied with current shipments of high-end ai servers.
now, let's look at how many high-end ai servers each end user (such as a csp in the united states) has.
number of high-end ai servers for end users
figure 8 shows the number of high-end ai servers by end users. in 2023, microsoft, which owns openai, has the largest number of high-end ai servers at 63,000 units, but by 2024, google will surpass microsoft and have the largest number of high-end ai servers.
figure 8 high-end artificial intelligence servers by end user (2023-2024) (source: digitimes research)
the top five in 2024 are google, ranking first with 162,000 units (5 systems), microsoft ranking second with 90,000 units (3 systems), super micro ranking third with 68,000 units (2 systems), and amazon (67,000 units) ranked fourth. 2 systems), followed by meta in fifth place with 46,000 units (1 system) (the number in brackets is the number of systems the chatgpt class generation ai can build). it can be seen that the top five solar thermal power generation companies in the united states monopolize about 80% of the share.
next, let’s look at the high-end ai server shipments of ai accelerators (figure 9). as expected, nvidia’s gpus are the most used for ai accelerators, reaching 336,000 units in 2024. however, surprisingly, the second most popular company is not amd, but google.
figure 9 high-end ai servers by ai accelerator (2023-2024) (source: digitimes research)
google developed its own tensor processing unit (tpu) as an ai accelerator. by 2024, the number of high-end ai servers equipped with this tpu will reach 138,000. here, from figure 8 we know that google will have 162,000 high-end ai servers by 2024. therefore, 138,000 units are expected to be equipped with google's own tpu, and the remaining 24,000 units are equipped with nvidia's gpu. in other words, for nvidia, google is both a customer and a formidable enemy.
in addition, if we look at the shipments in 2024, amd, which ranks third, has 45,000 units, followed by amazon, which ranks fourth, with 40,000 units. amazon is also developing aws trainium as an artificial intelligence accelerator. if it waits any longer, amd may be surpassed by amazon.
to sum up, nvidia currently has the largest shipments of ai accelerators, but google and amazon are becoming its strong competitors. nvidia's competitor is not the processor manufacturer amd (certainly not the endangered intel), but the us csps google and amazon.
a full-scale generative ai boom is coming
let's summarize everything so far. according to a report by digitimes research, shipments of high-end ai servers capable of developing and running chatgpt-level generative ai are expected to account for only 3.9% of all servers by 2024. it is believed that this shipment volume simply cannot meet the needs of csps.
in other words, nvidia’s “gpu festival” from 2023 to 2024 is just the beginning. as a result, a full-blown generative ai boom is likely to come. let us show the basics below.
figure 10 shows the semiconductor market by application and its future forecast published by the semiconductor industry association (sia). according to sia predictions, the global semiconductor market will exceed us$1 trillion in 2030.
figure 10 semiconductor shipment forecast by application (source: sia blog)
by 2030, the largest markets will be computing and data storage. this includes pcs and servers (and of course high-end ai servers), but since pc shipments are unlikely to increase significantly, servers will likely make up the majority.
wired communications refers to semiconductors used in data centers. this means that by 2030, computing and data storage ($330 billion) + wired communications ($60 billion) = a total of $390 billion will become semiconductors for data centers (including pcs), becoming the world's largest market.
another thing to watch is the data center market and its prospects,as shown in figure 11. after the release of chatgpt in 2022, the data center market is expected to grow steadily. data centers consist of three elements: network infrastructure, servers, and storage, and servers and storage are each expected to roughly double from 2023 to 2029.
figure 11 data center market outlook (the comprehensive generative ai boom has not yet arrived) (source: author based on statista market insights data)
in this way, server semiconductors (including high-end ai servers) will occupy the largest share of the global market, and the data center market will also expand.
repeat one last time.so far, nvidia's "gpu festival" is just a pre-holiday event. a full-blown generative ai boom is coming.