news

is openai's "strawberry" worth trillions?

2024-09-13

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

author | bi andi, editor | wang jing

what do ultraman and ma baoguo have in common? answer: they both love sneak attacks.

the news of "strawberry" has been circulating for several months. it is said that this is a mysterious project within openai, which seems to be quite different from the previous generation model. but openai has been keeping it secret. the closest moment to being exposed was a photo of a real strawberry posted by ceo sam altman on social media.

just a few days ago, the information broke the news that "strawberry" will be released in the next two weeks.

even with such high attention, openai caught the world off guard: on the afternoon of september 12th local time, without any notice or press conference, openai suddenly released a new model.

however, the name of the new model is not as delicious as "strawberry", but very serious and meaningful: o1.

you should know that openai has been iterating its models in the "gpt" series, from gpt-1 in 2018 to gpt-4o in may this year. now, openai has opened up a new line.

in the official blog post announcing the release of o1, openai said: “as an early model, it does not yet have many of the features that make chatgpt useful… but for complex reasoning tasks, this is a significant improvement and represents a new level of ai capabilities. in view of this, we reset the counter back to 1 and named this series openai o1.”

the new model is currently only available to paid chatgpt subscribers and some programmers. to show that the model is not yet mature, it is temporarily called "o1-preview". in addition, openai also released a small model version o1-mini. both o1-preview and o1-mini currently have a weekly response limit.

altman himself praised the new model as "our most powerful and consistent model to date" on social media x, but also emphasized that "o1 still has flaws and is still limited."

gary marcus, an ai scholar who always likes to pour cold water on chatgpt, joked that openai's move was a "familiar recipe": announce a demo, open it to a limited number of users, raise funds, and do it again.

at the time of the release of o1, openai is in the process of raising a new round of funds. according to the latest news from bloomberg, this round of financing will be a major event with a scale of tens of billions of dollars and a target valuation of 150 billion us dollars.

01

let’s first look at the model itself.

as previously rumored, one of the main focuses of o1 is "reasoning". and the key behind "reasoning" is "thinking".

for users, the most intuitive feeling is that o1-preview will take longer before answering questions.

in the o1-preview model, zizibang asked chatgpt "what is the date and day of the week today?" after the question was sent, chatgpt showed the steps it was thinking about one by one: answering the date question, reviewing the instructions, understanding the current date, and then giving the answer, marked "thinking for 8 seconds."

in comparison, under the gpt-4o model, chatgpt directly gives the answer within 3 seconds without showing the intermediate steps.

"this is a new type of large oracle model trained through reinforcement learning, designed to perform complex reasoning tasks. o1 thinks before answering questions - it can generate a long internal 'thought chain' before answering the user," openai wrote in a blog post.

when the o1 model was released, openai officials revealed very few technical details, and repeatedly emphasized the "thinking chain."

according to openai, o1 uses chains of thought when trying to solve problems, just like humans think for a long time before answering a difficult question. through reinforcement learning, o1 has learned to refine its chains of thought and optimize its use of strategies. it is able to identify and correct its own mistakes and learn to break down complex steps into simpler ones. when the current method does not work, it will try a different approach.

"this process greatly improves the model's reasoning capabilities."

how powerful is o1? in addition to the several demonstration videos released by openai, the most convincing thing is the test results. openai said that in many reasoning-intensive benchmarks, o1's performance is "comparable to human experts" and better than previous technologies. for example, in the international mathematical olympiad (imo), previous technologies scored 13%, while o1 scored as high as 83%.

in the codeforces programming competition, o1 scored an excellent 89%. based on o1, openai also developed o1-ioi, which is even better at programming, and its score exceeded 93% of the contestants.

another test that openai specifically took out to "show off" was gpqa-diamond, a benchmark test of expertise in chemistry, physics, and biology. openai invited experts with doctoral degrees to compete against each other and found that "o1's performance exceeded these human experts."

openai also said that after enabling visual perception capabilities, o1 scored 78.2% in the mmmu test, "becoming the first model that can compete with human experts." in addition, o1 surpassed gpt-4o in 54 of the 57 mmlu subcategories.

in short, compared with openai's previous models, o1 pays more attention to reasoning ability, and its abilities in mathematics and programming have been particularly improved. to exaggerate a little, it is simply a doctor of punching and an expert in kicking. moreover, through the "thinking chain", it is expected to reduce the model's hallucinations.

02

however, o1 is still in its early stages, and as ultraman emphasized, "it still has flaws and is still limited."

in the simple attempt of the alphabet list, o1-preview made mistakes. for example, when asked "which is bigger, 9.11 or 9.9", gpt-4o gave a wrong answer, and o1-preview also gave a wrong answer, saying seriously that "9.11 is indeed bigger than 9.9 because 9.11 (i.e. 9.11) is bigger than 9.9 (9.90)". there was a hint of humor in the verbosity, not to mention that it took 15 seconds to think.

the information also reported that some users who tried o1-preview said that many interactions "are not worth waiting an extra 10 to 20 seconds" and that they preferred the response speed of gpt-4o.

currently, o1-preview and o1-mini are open to paying users, but the number of messages is limited: 30 messages per week for o1-preview and 50 messages per week for o1-mini.

starting next week, chatgpt's enterprise and education (edu) users will also be able to access these two models. openai also said it wants to provide o1-mini to all users for free in the future, but the specific time has not been announced.

this is the first time that openai has added a suffix like "preview" when releasing a model. previously, both gpt-4 and gpt-4o were directly released in full.

one feature of o1 that cannot be ignored is that it is expensive.

the cost of developer access to o1 is very high: in terms of api, o1-preview charges $15 per 1 million input tokens, or blocks of text parsed by the model, three times as much as gpt-4o, and $60 per 1 million output tokens, four times as much as gpt-4o.

the atlantic analyzed in its report that o1 was specifically designed to require more time, which would inevitably consume more resources and increase the difficulty for aigc to make a profit.

03

gary marcus, mentioned at the beginning of this article, is a scholar at the intersection of human neuroscience and artificial intelligence, an honorary professor at new york university, and the founder and ceo of ai startup geometric intelligence. he is more well-known as a "thorn in the ai ​​world" and has criticized openai many times.

in his opinion, openai's sudden release of o1-preview is more of a publicity stunt.

after all, openai is undergoing an important round of financing. according to a recent report from bloomberg, openai is negotiating to raise $6.5 billion from investors at a valuation of $150 billion, and also wants to raise $5 billion from banks in the form of revolving credit.

"release a demo, open it to limited users, raise money, and repeat," marcus summarized openai's "methods."

in july this year, the information reported that openai may lose up to $5 billion this year, of which openai's employee costs this year are about $1.5 billion, ai training and reasoning costs may be as high as $7 billion, and annual revenue is expected to be between $3.5 billion and $4.5 billion.

at that time, the information predicted that at this rate of burning money, openai would soon have to raise funds. the last major financing for openai was in early 2023, when microsoft invested tens of billions of dollars.

this is not the first time that openai has released an "immature product" at a critical juncture.

last october, openai was reportedly seeking to sell its shares, with a possible valuation of $86 billion. however, the following month, openai underwent a shocking change in its top management. ultraman was kicked out of the company, but soon returned to the ceo position, winning the "palace fight." however, the stock sale plan was briefly delayed, and it was not until the end of november that there was news that the transaction was "back on track." at that time, people familiar with the matter said that employees were worried that this sudden incident would affect the sale of shares and the company's valuation.

interestingly, on february 15 this year, openai suddenly announced a new video generation model, sora, and the demo caused a sensation. within three days, the new york times reported that openai had completed the sale of its employees' shares, and the company's valuation exceeded $80 billion "as expected."

now, more than half a year has passed, and sora has not been open to the public, and has not even promoted large-scale testing. the outside world began to suspect that sora actually did not have enough computing power to support its operation. a report released by market research firm factorial funds believes that deploying sora requires 720,000 nvidia h100 chips.

in early september, taiwan economic daily reported that tsmc's a16 angstrom-level chip has received orders from major customers, including apple and openai. openai will use customized chips to improve sora's video generation capabilities. this seems to confirm that sora has encountered computing power bottlenecks before.

now that strawberry-flavored chatgpt is here, perhaps soon we will see news that openai has successfully completed a new round of financing and its valuation has exceeded one trillion yuan.