who can become nvidia's replacement?

who can become nvidia's alternative?

2024-09-23

author: barry

edited by guan ju

image source: midjourney

who can replace nvidia?

in the field of data center gpus, nvidia's shipments reached 3.76 million units in 2023, accounting for nearly 98% of the global market share, which can be said to be unmatched.

ai chips, also known as ai accelerators or computing cards, are modules specifically designed to handle large amounts of computing tasks in artificial intelligence applications. they mainly include graphics processing units (gpus), field programmable gate arrays (fpgas), application-specific integrated circuits (asics), etc.

according to gartner, the ai chip market will reach $53.4 billion in 2023, up 20.9% from 2022, and will grow 25.6% to $67.1 billion in 2024. by 2027, ai chip revenue is expected to more than double the 2023 market size to $119.4 billion.

the computing power arms race among major companies has undoubtedly become a strong driving force for the ai chip market.

starting from 2024, mainstream large models will almost all have more than 100 billion parameters, llama3 has 400 billion parameters, and gpt4 has 1.8 trillion parameters. a trillion-parameter large model corresponds to a super-large computing power cluster of more than 10,000 cards.

openai has at least 50,000 high-end nvidia gpus, meta has built its own cluster of 24,576 h100s, and google has its a3 supercomputer consisting of 26,000 h100s... more than 40,000 companies have purchased nvidia gpus, and companies like meta, microsoft, amazon, and google have contributed a total of 40% of its revenue.

the financial report shows that nvidia's gross profit margin reached 71%, of which the gross profit margin of the a100 and h100 series was as high as 90%. as a hardware company, nvidia has a higher gross profit margin than internet companies.

it is reported that nvidia's ai chips for data centers are priced at $25,000 to $40,000 each, which is 7-8 times the price of traditional products. kazuhiro sugiyama, consulting director of research firm omdia, said that nvidia's products are expensive, which is a burden for companies that want to invest in ai.

the high price has also caused many major customers to start looking for alternatives. on july 30, apple announced that its ai model was trained using 8,000 google tpus. openai's first chip was also revealed today, which will use tsmc's most advanced a16 angstrom-level process and is specially designed for sora video applications.

globally, ai chip star startups and unicorns have emerged one after another, trying to snatch food from nvidia. among them, there are chinese-backed unicorns sambanova and etched, which has just emerged, and cerebras systems, a unicorn invested by openai ceo altman, which is preparing for an ipo. after successfully listing arm last year, softbank group president masayoshi son acquired british ai chip company graphcore in july this year, trying to build the next nvidia.

sambanova, an ai chip unicorn created by chinese from stanford

on august 27, sambanova, an american ai chip startup, first detailed its newly launched ai chip sn40l, the world's first ai chip system for trillion-parameter artificial intelligence (ai) models based on the reconfigurable dataflow unit (rdu).

according to reports, the 8-chip system based on sambanova's sn40l can support a 5 trillion parameter model, and the sequence length on a single system node can reach 256k+. compared with nvidia's h100 chip, sn40l not only has 3.1 times the reasoning performance of h100, but also 2 times the training performance of h100, and the total cost of ownership is only 1/10 of that.

SambaNova CEO Rodrigo Liang

the company's three co-founders all have stanford backgrounds. ceo rodrigo liang is the former vice president of engineering at sun/oracle, and the other two co-founders are stanford professors. in addition, there are many chinese engineers in the team.

sambanova is currently valued at us$5 billion (approximately rmb 36.5 billion) and has completed six rounds of financing totaling us$1.1 billion. investors include intel, softbank, samsung, google ventures, etc.

not only do they want to challenge nvidia in chips, but they also want to go further than nvidia in terms of business model: directly participating in helping companies train private large models. and they don’t sell chips alone, but sell their customized technology stack, from chips to server systems, and even including the deployment of large models.

its ambition for target customers is even greater - aiming at the world's 2,000 largest companies. at present, sambanova's chips and systems have won many large customers, including the world's top supercomputing laboratories, japan's fugaku, the united states argonne national laboratory, lawrence national laboratory, and consulting firm accenture.

rodrigo liang believes that the next battlefield for the commercialization of big models and generative ai is the private data of enterprises, especially large enterprises. in the end, enterprises will not run a super large model like gpt-4 or google gemini, but create 150 unique models based on different data subsets, with more than one trillion aggregate parameters.

this strategy is in stark contrast to approaches such as gpt-4 and google gemini, where most giants hope to create a giant model that can generalize to millions of tasks.

etched, an ai chip company founded by two harvard dropouts born in the 2000s

the founders of etched are two harvard dropouts born in the 2000s. gavin uberti has held senior positions at octoml and xnor.ai, while chris zhu is of chinese descent. in addition to being a teaching researcher in computer science at harvard university, he also has internship experience at companies such as amazon.

they were optimistic about the direction of large models before chatgpt was released, so they dropped out of harvard university in 2022 and founded etched with robert wachen and former cypress semiconductor chief technology officer mark ross to build chips dedicated to large ai models.

gavin uberti (left) and chris zhu (right)

they took a unique route: an ai chip that can only run transformer and adopted an asic design. currently, almost all solutions on the market widely support ai models, and they have been certain since the end of 2022 that the transformer model will dominate the entire market, believing that the speed of gpu performance upgrades is too slow, and only by taking the path of specialized asic chips can a performance leap be achieved.

after two years, on june 27 this year, etched launched its first ai chip sohu, which became the world's first chip dedicated to transformer computing.

it runs large models 20 times faster than nvidia h100 and more than 10 times faster than the top-end b200 chip launched in march this year. a server equipped with eight sohu chips can replace a full 160 nvidia h100 gpus, greatly reducing costs without any performance loss.

since sohu only supports one algorithm, most of the control flow modules can be eliminated, and the chip can integrate more mathematical computing units, reaching more than 90% computing power utilization, while gpu can only achieve 30%. for a relatively small design team, maintaining a single-architecture software stack is obviously less stressful.

at the same time as the sohu chip was released, etched also announced that it had completed a us$120 million series a financing round, led by primary venture partners and positive sum ventures.

the main investors in this round of financing include well-known silicon valley investor peter thiel, former cto of cryptocurrency trading platform coinbase and former general partner of a16z balaji srinivasan, github ceo thomas dohmke, cruise co-founder kyle vogt and quora co-founder charlie cheever, etc.

cerebras systems, an ai chip unicorn invested by altman, plans to sprint for ipo

the most unique thing about cerebras systems, founded in 2015, is that their chips are very different from mainstream nvidia gpus. in the past, chips have become smaller and smaller under the guidance of moore's law. for example, nvidia h100 has 80 billion transistors on a core area of 814 square millimeters.

cerebras' ai chip, on the other hand, chooses to make the whole chip bigger and bigger, claiming to have "created the world's largest chip". it is reported that the wse 3 chip developed by cerebras is cut from a whole wafer and is larger than a plate, requiring a person to hold it with both hands. a wse 3 chip has 400 billion transistors (50 times that of h100) on a core area of more than 46,000 square millimeters.

chips that are larger than the plate need to be held with both hands. image source: ars technica

cerebras claims that their chips can train ai models that are 10 times larger than the current industry-leading models (such as openai's gpt-4 or google's gemini).

on august 27 this year, cerebras systems announced the launch of cerebras inference, an ai inference service that it claims to be the "fastest in the world." according to the official website, the inference service is 20 times faster than nvidia's service while ensuring accuracy; its processor memory bandwidth is 7,000 times that of nvidia, while the price is only 1/5 of that of gpu, with a 100-fold improvement in cost-effectiveness. cerebras inference also offers multiple service levels, including free, developer, and enterprise, to meet different needs from small-scale development to large-scale enterprise deployment.

co-founder and ceo andrew feldman holds an mba from stanford university, and chief technology officer gary lauterbach is recognized as one of the industry's top computer architects. in 2007, the two co-founded microserver company seamicro, which was acquired by amd for $334 million in 2012, and the two subsequently joined amd.

according to foreign media reports, cerebras systems has secretly applied for an ipo in the united states, and will be listed as early as october 2024. at present, the company has raised $720 million and is valued at approximately $4.2 billion to $5 billion. among them, one of the largest individual investors is openai ceo sam altman. altman reportedly participated in cerebras' $81 million series d financing.

tenstorrent, joined by a legendary chip developer, aims to become a "substitute" for nvidia

before 2021, tenstorrent was still an unknown company. however, as jim keller, a semiconductor industry god known as the "silicon sage", announced that he would join the company as chief technology officer and president, the company became famous.

jim keller's career can be regarded as a history of the computer industry. from 1998 to 1999, jim keller worked on the k7/k8 architecture that supported athlon at amd; from 2008 to 2012, he led the development of the a4 and a5 processors at apple; from 2012 to 2015, he presided over the k12 arm project and the zen architecture project at amd; from 2016 to 2018, he developed the fsd autonomous driving chip at tesla; from 2018 to 2020, he participated in a mysterious project at intel.

jim keller joined tenstorrent in the hope of providing a "low-cost alternative" to nvidia's expensive gpus. he believes that nvidia does not serve certain markets well, and these markets are precisely what tenstorrent wants to capture.

tenstorrent says its galaxy system is three times more efficient and 33% less expensive than nvidia’s dgx, the world’s most popular ai server.

tenstorrent is reportedly on track to release its second-generation multi-purpose ai processor by the end of this year. according to tenstorrent’s latest roadmap last fall, the company intends to release its black hole standalone ai processor and quasar low-power, low-cost chiplets for multi-chiplet ai solutions.

the company claims its upcoming processors offer performance efficiency comparable to nvidia’s ai gpus. at the same time, tenstorrent says its architecture consumes less memory bandwidth than its competitors, a key reason for its higher efficiency and lower cost.

the main feature of the tentorrent chip is that each of its more than 100 cores has a small cpu, which is the "brain in the brain". the core will be able to "think" on its own, deciding which data to process first, or whether to abandon certain tasks that are considered unnecessary, thereby improving overall efficiency.

so far, tentorrent has completed at least six rounds of financing. previously, tentorrent's investors were mainly venture capital. after jim keller joined, the company completed a new round of financing of us$100 million in august 2023. industrial capital began to appear among the investors - hyundai motor group and samsung's venture capital company samsung catalyst fund.

softbank buys graphcore at a discount to create nvidia rival

graphcore was founded in 2016 by cto simon knowles and ceo nigel toon. the company is committed to developing the intelligence processing unit (ipu), a processor designed specifically for artificial intelligence and machine learning with a unique architecture and advantages, such as a massively parallel mimd architecture, high memory bandwidth, and tightly coupled local distributed sram.

graphcore has launched a number of ipu-based products, such as the gc200 ipu processor, bow ipu, etc., and continues to upgrade and improve its technology.

however, in july this year, this troubled british ai chip company was acquired by softbank.

according to the agreement, graphcore will become a wholly-owned subsidiary of softbank and continue to operate under its current name. according to reports, the total transaction amount may reach about 400 million pounds (about 500 million us dollars, 3.56 billion yuan), which is about 82% less than the valuation of 2.8 billion us dollars in graphcore's last round of financing. in fact, softbank only bought graphcore at a 20% discount.

graphcore was once regarded as the "british version of nvidia". however, since 2020, the company has not received any new investment and has lost important orders from microsoft, which has caused it to be financially tight and operationally difficult, and failed to keep up with the general trend in the field of ai chips. at the same time, the continued tightening of us export controls on chinese ai semiconductors also affected graphcore's development in china, and it eventually had to choose to withdraw from the chinese market and lose a quarter of its total revenue.

the acquisition of graphcore not only consolidates softbank's position in the field of ai chips, but is also an important step in masayoshi son's ai strategy.

former google engineer founded groq and created a new species lpu

in august this year, groq announced the completion of a us$640 million series d financing round. investors included blackrock, cisco investments, samsung catalyst fund, etc., and its valuation reached us$2.8 billion.

the company, founded in 2016 by former google engineer jonathan ross, claims that its language processing unit hardware, lpu, can run existing genai models, such as gpt-4, ten times faster and with one-tenth the energy consumption. the company set a new large language model (llm) performance record of 300 tokens per second per user with meta’s llama 2.

compared to the versatility of gpus, lpus, while performing well in language processing, have a narrower range of applications. this limits their versatility across a wider range of ai tasks. in addition, as an emerging technology, lpus have not received widespread support from the community and face challenges in availability.

groq plans to deploy more than 108,000 lpus by the end of the first quarter of 2025, the largest ai inference deployment outside of major tech giants.

news

who can become nvidia's alternative?

introduction

my contact information