news

musk pursues trillion-dollar openai

2024-09-12

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

openai is very strong. if we were to put a number on its strength, that number would probably be $150 billion.

on september 11th local time, bloomberg reported that openai is raising funds at a valuation of $150 billion, raising $6.5 billion from investors, and is also negotiating to borrow $5 billion from banks in the form of a revolving loan.

the news that openai was going to raise funds started circulating at the end of august. at that time, the media reported that the company was valued at more than $100 billion and raised several billion dollars, which shocked the outside world. after all, at the end of last year, when employees sold their existing shares, openai was valued at $86 billion.

according to previous news, in addition to microsoft, other giants that participated in openai's current round of financing include apple and nvidia. the valuation has risen sharply and giants have bet on it, which shows the appeal of openai.

the other side of the story is openai's astonishing rate of burning money and its internal and external troubles.

openai's last major financing took place in january 2023, when microsoft invested about $10 billion. in other words, openai has burned through tens of billions of dollars in less than two years. the information previously reported that openai's losses this year could reach $5 billion, and the media predicted at the time that openai would soon need financing, which unfortunately came true.

due to internal troubles, openai has not only slowed down its update speed, but is also experiencing frequent exodus of senior executives.

openai is very busy launching the gpt store and releasing gpt-4 in 2023. more than half of this year has passed, and most of openai's new products are hot but not real. sora was officially announced at the beginning of this year, but it has not even been tested on a large scale yet; the long-rumored new model "strawberry" is even more mysterious.

at the same time, only two of the 11 co-founders are still working at openai. recently, alexis conneau, a key figure behind gpt-4o and gpt-5, announced his departure from openai.

external threats: giants such as google and microsoft have acquired openai's competitors in disguised form, and other star ai startups in silicon valley. inflection ai and character ai already have actual "owners".

what’s even more troublesome is that old enemy elon musk has gone from verbal battles with openai to real close combat with guns and knives.

xai, founded by musk, has become one of openai’s strongest rivals in just 15 months. in may this year, when its series b round of financing was completed, xai not only ranked second after openai with a single round of financing of $6 billion, but also achieved a valuation of $24 billion, becoming the second most valuable ai startup after openai.

in terms of products, xai's grok model has undergone multiple iterations, and the performance of grok 2 is now close to that of gpt-4.

just as the open battle was at a stalemate, elon musk announced on september 3 that the supercomputing cluster colossus, which includes 100,000 nvidia h100s, has officially been launched, making it the world's largest supercomputing cluster.

the expansion of computing power serves the grand goal. grok 3 is already on the way, and musk has vowed to develop the world's most powerful artificial intelligence by the end of this year.

openai, which has long been constrained by computing power shortages, has launched several new data center projects with microsoft, either expanding existing microsoft supercomputers or building new facilities. it was also revealed that it has a "stargate project" with millions of nvidia chips.

but the difference between preparation and "official launch" is obvious. in the ai ​​arena where every second counts, time waits for no one. according to the information, altman has expressed to microsoft executives his concerns about xai's computing power surpassing openai.

the computing power war between musk and ultraman has begun.

one is musk, who said, "i won't marry into a wealthy family, i am a wealthy family myself," and is advancing triumphantly with xai. the other is openai, which is backed by a giant and is still the vane of the ai ​​industry. the battle of computing power has begun, and the challenge is not just about having money.

musk and altman are both co-founders of openai. they hit it off when google used deepmind to gain attention in the field of artificial intelligence, and parted ways on the eve of the openai non-profit organization's limited commercialization.

chatgpt became a hit two years ago, and musk launched a fierce attack on openai and even altman himself, saying that openai "violated its original intention" and became a vassal of microsoft. the latter is openai's largest funder, investing tens of billions of dollars.

altman's performance was mostly bland. but from altman's few words, it is not difficult to see his contradictory emotions of disgust and indifference to musk. altman once commented that musk was "a jerk". half a year later, altman talked about musk and said something good: "elon is definitely a talent magnet and the focus of attention, and he does have some real superpowers."

at this point in time, musk's xai had already been established, and ultraman was well aware of the capabilities of this silicon valley madman.

when xai was officially announced in july 2023, it was considered to be aimed at openai. musk did not shy away from this.

but at that time, the outside world was mostly on the sidelines about xai. after all, it started late and was small in scale. in addition to openai and microsoft, there were giants like google and star startups like anthropic in the field. not only did xai only have 11 initial members, but it also naturally rejected large investments from technology giants. some well-known media, including wired, ridiculed xai's challenge to openai as more of a "hallucination" of musk.

however, the speed of xai development has exceeded everyone’s imagination.this is exactly what ultraman said about musk - superpowers.

it is not difficult to find out from the timeline that xai has a high speed of product release and iteration. in the 13 months since its establishment, it has released the first large grok model, iterative grok 1.5, multi-modal grok 1.5v, grok 2, and the small model grok 2mini.

musk, who is well versed in marketing, gave grok a distinct personality. when users talk to grok, they will find that this robot is glib, sarcastic, loves to make jokes and challenge "political correctness."

however, beneath the veneer of banter, the grok model has been nibbling at openai.

take the latest battle situation as an example. in august, xai released two models, grok 2 and grok 2 mini. the performance of coding, mathematics, and reasoning has been greatly improved compared with the previous generation, and a new raw picture function has been added. at that time, in the lmsys overall list, the early version of grok 2, sus-column-r, ranked third, which could compete with gpt-4o and directly surpass anthropic's claude 3.5 sonnet. in specific use, grok 2 maintains an "unorthodox" style of painting. the newly launched raw picture function allows ai to "draw" pictures that other well-known large models walk around, such as musk holding a gun, disney animation characters murder scenes, etc., which once ignited the enthusiasm of netizens for the whole thing.

what is even more surprising is that xai completed its series b financing in may this year, with a total financing amount of us$6 billion.in the field of large models, this is the second largest single financing after openai. openai's total financing amount is us$14 billion, the largest of which occurred in january 2023, when microsoft invested us$10 billion.

in comparison, before xai's round of financing, the only company with the second largest single financing scale after openai was anthropic, which received us$4 billion in september 2023 and us$2 billion in october. following it was inflection ai, which received a single financing amount of us$1.3 billion in june 2023.

among them, xai started the latest, only one year after its establishment, but it not only came up with a large model that is closely following openai in product strength, but also closely followed in terms of financing scale.

today, xai has become a competitor that openai cannot ignore, and even one of its most powerful competitors.

whether it is a competition between products or financing scale, it is an open battle between the two. under the surface, the computing power battle between xai and openai is also going on fiercely.

just on september 3, musk announced at x that colossus (giant) has officially been launched.

colossus is xai’s super ai training cluster located in memphis, tennessee, and powered by 100,000 nvidia h100 gpus.

in addition, musk also promised that in the coming months, colossus will continue to increase its investment and double the gpu, increasing the number of gpus in the entire cluster to 200,000, of which 50,000 will be nvidia h200.

what does a cluster consisting of 100,000 nvidia h100s mean?

simply put, it is currently the world's number one. further explanation is that because major companies are hoarding gpus, it is difficult to see public figures on the actual number of active gpus. however, in june last year, inflection ai, which raised $1.3 billion, once boasted that it would build the "world's largest supercomputer", which would consist of (only) 22,000 nvidia h100 chips. in march of this year, meta announced two new data center clusters, each containing 24,000 h100 chips. several companies are planning to build supercomputers containing 100,000 or more nvidia chips, but only musk has come out to say that it has been built.

the amazing thing about the official launch of colossus is not only its scale, but also the time it took to get it online. in musk's words, the team built colossus in just 122 days, or 4 months. generally speaking, an ai training cluster of this size would take at least a year to build.

musk first revealed that xai was building a "computing super factory" in may this year. in july, musk announced that some companies including xai, x, and nvidia had begun trials. at that time, the cluster was called the "memphis supercluster."

"play to win, or don't play at all."musk's june x message is the best interpretation of his ambition, and his more specific plan is to train "the most powerful artificial intelligence in the world by every metric" by december this year.

according to musk himself, training grok 2 requires 20,000 nvidia h100 chips, and training grok 3 may require 100,000 h100s.

grok 2’s capabilities are already close to gpt-4, so it’s not hard to see that musk wants to achieve something amazing this time. it’s also clear who the key target of the “most powerful” ai is going to be.

such news is naturally a pressure for openai.

according to the information, people familiar with the matter revealed that altman has expressed concerns to microsoft's senior management, and he is worried that xai's computing power will soon surpass openai.

altman had good reason to worry.

currently, openai's large model iteration stops at the "4" series. according to the global rankings updated by lmsys on september 4, openai's chatgpt-4o-latest version was updated on august 8 this year and currently ranks first, but in terms of comprehensive scores, it is not far behind gemini-1.5-pro and grok-2. gpt-4o, updated on may 13 this year, ranks fifth.

it can be said that people have been waiting for gpt-5 since openai released gpt-4 in march last year. initially, the consensus was that gpt-5 would be launched at the end of 2023 or the summer of 2024, but a few months ago, mira murati, openai's chief technology officer, publicly stated that gpt-5 might be delayed until the end of 2025 or the beginning of 2026.

the reason for the delay of gpt-5 is most likely due to "two insufficiencies": insufficient computing power and insufficient data. murati revealed that the parameters of gpt-5 will reach 52 trillion, a significant increase compared to the 2 trillion of gpt-4.

not only does the development of the next generation of gpt require sufficient computing power. in march this year, market research firm fundamental funds released a report analyzing the hardware resources needed for openai to deploy sora, and concluded that at its peak, sora is expected to require 720,000 nvidia h100 chips to support its computing needs. this report can also explain why sora has not yet been released to the public.

openai's current computing power mainly comes from microsoft, or thanks to microsoft's investment. in 2020, microsoft built a supercomputer with 10,000 graphics cards to support openai's work. it was one of the five fastest supercomputers in the world at the time.

last march, microsoft announced new progress in its infrastructure cooperation with openai. the original 10,000 cpu supercomputers have been upgraded to include tens of thousands of a100 chips, and the system cost "may exceed" hundreds of millions of dollars.

computing power is the key bottleneck that ultraman often talks about.as early as may 2023, ultraman sat at a u.s. congressional hearing and expressed his concerns about the computing power bottleneck.

this year, altman's statement about computing power was more grand and firm. he believed that the two "currencies" of the future would be computing power and energy: "this (computing power) may be the most precious commodity in the world, and we should invest heavily to perform more computing."

in march, altman also sent a message complaining that there were not enough nvidia gpus to support ai development. it is reported that after that, openai obtained more server usage rights from microsoft. by mid-2025, oracle and microsoft will provide openai with one of the most powerful nvidia server clusters in the world, with an annual rental fee of approximately us$2.5 billion.

according to the information, microsoft is planning more upgrades and plans to build multiple ai infrastructures by 2030.

microsoft and openai's computing power plan can be divided into five stages:the two parties are currently in the third phase of the plan. the fourth phase is the supercomputer that microsoft is building for openai, which is scheduled to be put into operation around 2026. the wisconsin economic development authority said that microsoft has started construction and expansion of a $1 billion data center there, and people familiar with the matter revealed that the final cost of the data center may be as high as $10 billion.

the fifth phase is the sensational "stargate" project, which was exposed by the media in march this year: microsoft and openai are planning to build a supercomputer with "millions" of chips, and the project cost may be as high as $100 billion.

in addition to trying to obtain more computing power resources from the outside world, ultraman also personally took action to advance his chip ambitions.ultraman is eager to have its own chips to reduce chip costs. currently, nvidia's high-end ai chips are not only expensive but also in short supply.

earlier this year, there was news that ultraman wanted to build a 7 trillion chip empire, but this was too grand and far-fetched. according to reports from all sides, ultraman has been very active in chip issues in the past year and has actively negotiated with all parties.

on the one hand, they are developing chips. in july, it was reported that openai had established an internal chip team, led by richard ho, the former senior vice president of engineering at google tpu. broadcom, which had worked with google to produce tpu, has communicated with the openai chip team. in addition, broadcom's competitors are also promoting their services to openai.

another aspect is chip production. altman is said to have negotiated with executives of major chip manufacturers and suppliers, including tsmc, to increase production capacity and produce more nvidia chips or even openai's new chips. in addition, altman may have contacted memory chip manufacturers samsung and sk hynix earlier this year.

and ultraman's first step seems to be about to take place. recently, taiwan economic daily reported that openai has reserved tsmc a16 chips for use in sora.the a16 chip is the most advanced process node that tsmc has revealed so far, and it is also tsmc's first step into the angstrom system. it is expected to be mass-produced in the second half of 2026. in addition to openai, apple is also one of the first customers of a16.

the computing power competition between musk and ultraman is in full swing, but expanding computing power requires more than just money. they each have their own "advantage curse."

musk has many companies under his control, which are mainly related to xai, including x and tesla. x can provide data and users for xai, while tesla is more "useful", not only providing at least 11 members for xai, but also directly providing gpu for xai. in addition, musk once said that the large amount of visual data collected by tesla can be used by xai to train large models. in july this year, musk even posted a vote on x, asking fans whether they agree that tesla will invest $5 billion in xai.

but "pouring from left hand to right hand" brings convenience to musk, but also brings trouble.

tesla is not at its peak. on the contrary, tesla's electric car sales are weak, and the new flagship sedan model has not been launched yet. tesla's investors have reacted strongly to tesla's "blood transfusion" for xai, and there has been constant opposition, even triggering multiple lawsuits.

the controversy was so intense that even the stubborn musk had to come out to appease tesla shareholders. after the news that musk asked nvidia to give priority to xai chips, he explained that tesla had no place to start the chips, emphasized that the southern expansion project of tesla gigatexas would be completed, and revealed that tesla would spend $3 billion to $4 billion to purchase nvidia chips this year.

on the other hand, another major advantage that musk has in leading xai to advance is its "independence." not "going along with the tech giants" is how musk and xai occupy the high ground of public opinion, and it also helps him attract investment and become a force to check and balance the giants.

but this also means that musk will not have the "financial backers" of startups such as openai, anthropic, and inflection, which have all been injected with large sums of money by giants at one time. the road to computing power expansion is paved with gold. the cost of 100,000 nvidia h100s alone is about $2.5 billion (unless nvidia offers bulk discounts).

one of openai’s biggest advantages is that it’s backed by microsoft, which has been its most important supporter in the two years since the company came into the spotlight with chatgpt.

but it is an open secret that the relationship between openai and microsoft is close yet subtly awkward.

the most direct contradiction is that openai and microsoft are both partners and competitors. in other words, microsoft products that are deeply embedded with openai models, such as copilot and the new bing, will compete with openai for customers. previously, altman flew to san francisco, new york, london and other places to personally give presentations to hundreds of executives from fortune 500 companies to promote openai's enterprise-level products and services.

just on september 5, openai also announced that the enterprise version of chatgpt subscription service launched a year ago now has more than 1 million paying users.

microsoft has already made adjustments at the strategic level. previously, microsoft acquired inflection ai in disguise, absorbed almost all of its talents, and established an internal ai team "microsoft ai" to catch up with other giants. in may, the information reported that microsoft is about to launch a new ai model, internally codenamed mai-1, with 50 billion parameters.

let me ask you, if microsoft has its own large model with high performance, why not replace openai?

moreover, openai's unique governance structure has planted a huge time bomb in the cooperation between the two parties: according to the agreement, openai's authorization to microsoft will terminate with the advent of agi.

microsoft is facing a paradox: it is striving to provide money and computing power to push openai to achieve agi, but at the same time it is also pushing itself to lose openai.

the supercomputing clusters that microsoft and openai are already building and planning to build involve huge capital investments. take the stargate project for example. the project cost is $100 billion, which is a difficult amount to bear even for microsoft. microsoft's capital expenditure for fiscal year 2024 (ending june 30) is $55.7 billion.

on the road to expanding computing power, openai must either "lock up" its relationship with microsoft or find a way out for itself as soon as possible. otherwise, what awaits openai is a situation where "success and failure are both due to xiao he".

finally, in the computing power competition between musk and ultraman, there is still a stumbling block called "energy".

the information estimates that gpus require more power than traditional chips, and a cluster of 100,000 chips might require 100 megawatts of dedicated power. this is 10 times the amount of electricity consumed by a traditional data center, and can power 70,000 to 100,000 homes.

the authenticity of musk's super cluster colossus "already online" has therefore been questioned. the power company said that xai will be able to obtain about 50 megawatts of electricity by august, and a power station under construction can provide another 150 megawatts of electricity, but it will not be realized until 2025.

some people also speculated that musk was trying to bypass the power company and use fossil energy generators to power the cluster. for this reason, an environmental organization in memphis, tennessee complained to him.

this is not just a problem for xai and openai. according to statistics from the infomation, there are currently 17 supercomputing centers in use or under construction in seven states in the united states (excluding projects such as the "stargate project" whose feasibility is questionable). if all of them were put into operation, the u.s. department of energy would be overwhelmed and there might be a power shortage.

it seems like yesterday that musk and altman were arguing, but now they are in a cautious physical fight. computing power expansion often unfolds in a grand narrative, with billions or even tens of billions of dollars invested to build a wall of tens or even hundreds of thousands of chips. but along the way, the two still have many hurdles to overcome.