slow and expensive? openai's inference model "strawberry" is here, how far is gpt-5?

2024-09-13

openai's "strawberry" is finally here.

on september 12th local time, artificial intelligence (ai) giant openai released the openai o1 series of ai reasoning models code-named "strawberry", including openai o1-preview and a smaller version o1-mini.

according to openai, o1 spends more time thinking about problems before reacting, just like humans. through training, it has learned to improve its thinking process, try different strategies, and recognize its mistakes. compared with previous scientific, coding, and mathematical models, it can reason about complex tasks and solve more difficult problems.

openai said that this series of models is a major improvement for complex reasoning tasks, so it also reset the product counter to 1 and named it openai o1. the "o" in the name may refer to orion.

stronger reasoning ability, thinking chain provides new opportunities for model security

the strawberry project was previously called q*, and it was the cause of the openai "palace fight" and the sudden dismissal of the company's ceo sam altman. at that time, according to sources, openai's chief technology officer mira murati told employees that a letter about the breakthrough of q* (pronounced q-star) ai prompted the board to take action to fire him.

openai's progress on q* has led some inside the company to believe that this could be their breakthrough in finding "super intelligence (agi, general artificial intelligence)."

according to official information, o1 has many capabilities that far exceed gpt-4o. it ranked 89th in the programming competition (codeforces), ranked among the top 500 students in the united states in the american mathematical olympiad (aime), and its accuracy in the benchmark test of physics, biology, and chemistry questions (gpqa) also exceeded the level of human doctoral students.

it is reported that similar to how humans may think for a long time before answering a difficult question, o1 also uses a series of thoughts when trying to solve a problem. through reinforcement learning, o1 can learn to hone its own chain of thoughts and improve the strategies it uses. it learns to identify and correct errors, break down tricky steps into simpler ones, and try different methods when the current method is ineffective.

openai said that using thought chains can significantly improve safety and consistency because the model's thinking can be observed in a clear way, and the model's reasoning about safety rules is more robust to out-of-distribution scenarios.

xu li, executive chairman and ceo of sensetime, previously stated that in the future, whether a model is smart or not will depend entirely on whether the methodology for constructing the model's thinking chain data is strong enough, and whether this strong methodology can be sustainable and iterative.

partial screenshot of openai o1's original complete thought chain

however, openai ultimately chose not to show the original thought chain to users, but to show a summary of the thought chain. openai further explained: "hidden thought chains provide a unique opportunity for monitoring models. assuming that the model is faithfully readable, the hidden thought chain allows us to 'read' the model's thoughts and understand its thought process. for example, in the future we may want to monitor thought chains to detect signs of user manipulation. however, to do this, the model must be able to express its thoughts freely in an unaltered form, so we cannot train any policy compliance or user preferences on the thought chain. we also don't want users to see inconsistent thought chains directly."

currently, the o1 model is available to plus and team users of chatgpt, and will be available to enterprise and education users starting next week.

slower but more expensive, not a gpt-4o successor

currently, the weekly message limit for o1-preview is 30, and for o1-mini it is 50. openai said it is working to increase the message limit and enable chatgpt to automatically select the appropriate mode based on the given prompts, and to expand access rights to more levels of users, planning to provide access rights to o1-mini to all free users.

on the api side, o1-preview charges $15 per 1 million input tokens (large models break text into words, character sets, or combinations of words and punctuation) and $60 per 1 million output tokens. compared with gpt-4o, the input cost is 3 times that of gpt-4o and the output cost is 4 times that of gpt-4o.

o1-mini is positioned as faster and more cost-effective, and is said to perform better in mathematics and programming. it is almost comparable to o1 in performance on evaluation benchmarks such as aime and codeforces. it can be used as an alternative to o1-preview with higher rate limits and lower latency. the price for api users to use o1-mini is 80% cheaper than o1-preview.

other drawbacks are also obvious. openai o1 is slower than other models. according to foreign media reports, it may take more than 10 seconds for o1 to answer some questions. the model chooses to display the progress by displaying the label of the subtask currently being performed.

at the same time, given the unpredictable nature of generative ai models, o1 may have other flaws and limitations. for example, it sometimes makes mistakes in tic-tac-toe. in a technical paper, openai said some testers reported that o1 was more prone to hallucinations than gpt-4o and was less willing to admit when it didn't know the answer to a question.

additionally, o1 is currently unable to browse the web or analyze files.

altman admitted that o1 is openai's most powerful and most aligned series of models, but it still has flaws.

openai president greg brockman also said that o1 technology is still in its early stages and openai is actively exploring aspects including reliability, hallucinations, and robustness (stability) against attackers.

openai said that as an early model, openai o1 does not yet have many features that make chatgpt useful, such as browsing online information, uploading files and pictures, etc. for many common situations, gpt-4o will have stronger capabilities in the short term. regular updates and improvements will be made in the future, and "work is still in progress to make this new model as easy to use as the current model."

openai emphasized that openai o1 is not the "successor" of gpt-4o, and that 4o can be used in combination with the reasoning capabilities of o1. it also plans to continue developing and releasing the gpt series of models after the o1 series.

comparison of openai o1 and gpt-4o on multiple benchmarks

it is worth mentioning that according to foreign media reports, openai may be considering setting higher charging standards for the "strawberry" large model or the next-generation flagship model, which may reach up to $2,000 per month. in comparison, the current charging standard for chatgpt plus is $20 per month.

openai is in turmoil, when will gpt-5 arrive?

according to foreign media reports, murati said that the next-generation main model gpt-5 is currently being built, which will be much larger than its predecessor. although the company still believes that scale will help to tap new capabilities from artificial intelligence, gpt-5 is likely to include the reasoning technology launched this time.

“there are two paradigms,” murati said. “the scaling paradigm and this new paradigm. we want to bring them together.”

the road to developing gpt-5 was not easy.

on september 11, local time, alexis conneau, the research leader of gpt-4o/gpt-5 and technical director of openai's version of "her", suddenly announced his resignation. prior to this, openai co-founder and chief scientist ilya sutskever, "super alignment" team leader jan leike, co-founder john schulman, and chatgpt head peter deng had all resigned... currently, only two of the 11 founders remain at openai.

in addition to personnel turmoil, openai has also been exposed to face a revenue and expenditure gap of us$5 billion this year.

but what we can see is that openai is already taking action.

openai's latest plan is to raise about $6.5 billion in new financing, with a valuation of $150 billion. at the end of last month, openai was reported to be preparing for a new round of $1 billion in financing led by venture capital firm thrive capital, with microsoft, apple, and nvidia likely to follow suit, with a valuation of just over $100 billion.

on august 29th local time, openai said that chatgpt currently has more than 200 million active users per week, double the number of last fall.

openai says 92% of fortune 500 companies use its products, and usage of its api has doubled since the launch of chatgpt-4o mini in july.

thepaper.cn reporter qin sheng

(this article is from the paper. for more original information, please download the "the paper" app)

report/feedback

news

slow and expensive? openai's inference model "strawberry" is here, how far is gpt-5?

introduction

my contact information