openai releases new o1 model: it will be as "thoughtful" as humans

2024-09-13

author: sukhoi

without any warning, openai's long-promoted "strawberry" model was released.

introduction slice of the o1 model, source: openai

early this morning beijing time, openai released a new model called openai o1, which was previously known as "strawberry", but the original code name of o1 was "q*". openai ceo sam altman called it "the beginning of a new paradigm."

from the official information of openai,to sum up, the characteristics of o1 are: bigger, stronger, slower and more expensive.

openai o1 has made significant progress in reasoning capabilities through reinforcement learning. the r&d team observed that the performance of the o1 model gradually improved with the increase of training time (increase in reinforcement learning) and thinking time (computation during testing). the challenges faced in scaling this approach are very different from the pre-training limitations of large language models (llms).

o1 performance increases steadily with training time and test time. source: openai

regarding the rumor on the market that "the o1 model can autonomously perform browser or system operation level tasks for users", the current public information does not mention this function.

openai officials said: "although this initial model does not have functions such as searching for information on the internet, uploading files and pictures, it has made significant progress in solving complex reasoning problems, which represents a new level of artificial intelligence technology. so we decided to give this series a new starting point and named it openai o1." it can be seen thatthe main application of o1 is still focused on problem solving and analysis through text interaction, rather than directly controlling the browser or operating system.

unlike earlier versions,the o1 model will "think carefully" before answering, just like humans.it takes about 10-20 seconds to generate a long internal chain of thoughts, and to be able to try different strategies and identify one's own mistakes.

this powerful reasoning ability gives o1 a wide range of application potential in multiple industries, especially complex science, mathematics, and programming tasks. when dealing with physics, chemistry, and biology problems, o1's performance is even comparable to that of doctoral students in the field. in the qualifying examination for the international mathematical olympiad (aime), o1's accuracy rate was 83%, successfully entering the ranks of the top 500 students in the united states, while the accuracy rate of the gpt-4o model was only 13%.

altman also shared o1 on x, source: x

openai provides some specific use cases. for example, medical researchers can use o1 to annotate cell sequencing data; physicists can use o1 to generate the complex mathematical formulas required for quantum optics; software developers can use it to build and execute complex multi-step workflows.

the o1 series includes three models, openai o1, openai o1-preview and openai o1-mini. these two models are open to users starting today:

OpenAI o1: advanced reasoning model, not open to the public yet.

OpenAI o1-preview: this version focuses more on deep reasoning processing and can be used 30 times per week.

OpenAI o1-mini: this version is more efficient and cost-effective, suitable for coding tasks, and can be used 50 times per week.

developers and researchers can now access these models through chatgpt and an application programming interface.

as for the price, the information reported earlier that openai executives were discussing the proposed price of its upcoming large models "strawberry" and "orion" at $2,000 per month, which triggered a lot of complaints and condemnation. but today, someone discovered that chatgpt pro membership has been launched, priced at $200 per month. the gap from $2,000 to $200 makes it hard for people not to feel like they are "getting a bargain", and openai has clearly played the price psychological warfare.

in may this year,altman with mit president sally kornbluthduring the fireside chat,gpt-5 may separate data from the reasoning engine.

"gpt-5 or gpt-6 can become the best reasoning engine. at present, the only way to achieve the best engine is to train a large amount of data." altman believes.but in reality, the model wastes a lot of data resources when processing data.for example, gpt-4 can also work like a database, but the reasoning speed is slow, expensive, and the effect is "unsatisfactory".these problems are essentially a waste of resources caused by the design and training methods of the model.

“it’s inevitable that this is a side effect of the only way we model our inference engines.” the new methods he foresees in the future,it is to separate the model's reasoning ability from the demand for big data.

but in today's release, gpt-5 did not appear, and the idea of separating data from the inference engine was also missing.

as for the price, the information reported earlier that openai executives planned to set the price of the new large models "strawberry" and "orion" at $2,000 per month, which triggered a lot of complaints and condemnation. but today, someone found that chatgpt pro membership has been launched, priced at $200 per month.

the gap from $2,000 to $200 makes it hard for users not to feel that they are getting a bargain. openai has clearly played this price psychological warfare very well.

2. polish the "thinking chain"

the large model has always been criticized for its "inability to count".the root cause is that large models lack the ability to perform structured reasoning.

reasoning is one of the core abilities of human intelligence.large models are mainly trained through unstructured text data, which usually includes news articles, books, web page texts, etc. text is a natural language form and does not follow strict logical or structural rules, so the model learns mainly how to generate language based on context, rather than how to reason logically or follow fixed rules to process information.

but many complex reasoning tasks are structured.

for example, logical reasoning, mathematical problem solving, or programming. if we want to get out of a maze, we need to follow a series of logical and spatial rules to find the exit. this type of problem requires the model to understand and apply a series of fixed steps or rules, but this is exactly what most large models lack.

therefore, although models such as chatgpt and bard can generate seemingly reasonable responses based on training data, they are actually more like "stochastic parroting".they are often unable to truly understand the complex logic behind or perform high-level reasoning tasks.

it’s important to understand that large models excel at processing unstructured natural language text, because that’s the focus of their training data, but when it comes to tasks that require structured logical reasoning, they often struggle to perform as accurately as humans.

to solve this problem, openai thought of usingchain of thought (cot)to "break the deadlock".

thinking chain is a technology that helps ai models to reason. it allows the model to gradually explain each step of the reasoning process when answering complex questions, rather than giving the answer directly. therefore, when the model answers questions, it is like a human solving a problem, thinking about the logic of each step first, and then gradually deriving the final result.

however, in the process of ai training, manual labeling of thought chains is time-consuming and expensive, and the amount of data required under the dominance of scaling laws is basically an impossible task for humans.

at this point, reinforcement learning becomes a more practical alternative.

reinforcement learning allows the model to learn by itself through practice and trial and error. it does not require manual labeling of each specific step, but instead optimizes problem-solving methods through continuous experimentation and feedback.

specifically, the model adjusts its behavior based on the results (good or bad) of the actions taken while trying to solve the problem. in this way, the model can autonomously explore multiple possible solutions and find the most effective method through trial and error. for example, in a game or simulation environment, ai can continuously optimize its strategy through self-play, and eventually learn how to perform complex tasks accurately without manual guidance of each step.

for example, alphago, which swept the go world in 2016, combined deep learning and reinforcement learning methods. it continuously optimized its decision-making model through a large number of self-games, and was ultimately able to defeat the world's top go player lee sedol.

the o1 model uses the same method as alphago to gradually solve problems.

in this process, o1 continuously improves his thinking process through reinforcement learning, learns to identify and correct errors, breaks down complex steps into simpler parts, and tries new methods when encountering obstacles. this training method significantly improves o1's reasoning ability and enables o1 to solve problems more effectively.

greg brockman, one of the co-founders of openai, is "very proud" of this.this is the first model we trained using reinforcement learning." he said.

slice of brockman's tweet, source: x

brockman introduced that openai's model originally performed system-one thinking (fast, intuitive decision-making), while the mindchain technology activated system-two thinking (careful, analytical thinking).

system 1 thinking is suitable for quick response, while system 2 thinking uses the "thinking chain" technology to allow the model to reason and solve problems step by step. practice has shown that through continuous trial and error, training the model from beginning to end (such as in games such as go or dota) can greatly improve the performance of the model.

in addition, although the o1 technology is still in the early stages of development, it has performed well in terms of security. for example, by enhancing the model's in-depth reasoning of the policy to improve its robustness against attacks and reduce the risk of hallucinations. this deep reasoning capability has begun to show positive results in security assessments.

"we developed a new model based on the o1 model and allowed it to participate in the 2024 international olympiad in informatics (ioi) competition, where it scored 213 points in the 49% ranking," openai said.

it competed under the same conditions as human contestants, solving six algorithmic problems with 50 submissions per problem. it demonstrated the effectiveness of its selection strategy by screening multiple candidate solutions and selecting submissions based on public test cases, model-generated test cases, and scoring functions, with an average score higher than that of random submissions.

when the number of submissions was relaxed to 10,000 per question, the model performed even better, with a score exceeding the gold medal standard. finally, the model demonstrated "amazing" coding ability in a simulated codeforces programming competition. gpt-4o had an elo rating of 808, which was in the 11th percentile of human competitors. our new model had an elo rating of 1807, outperforming 93% of competitors.

further fine-tuning in programming competitions improved the performance of the o1 model. source: openai

2. openai in a turbulent time

before the release of o1, openai had been mired in the shadow of changes in the company's core senior management.

in february, andrej karpathy, a founding member and research scientist at openai, announced at x that he had left the company. karpathy said he left openai amicably and “not because of any particular incident, issue, or drama.”

former chief scientist and co-founder ilya sutskever announced his resignation in may, and the super alignment team was disbanded. the industry believes that this is a failed attempt by openai to balance the pursuit of technological breakthroughs and ensuring ai safety.

from right, ilya sutskevi, greg brockman, sam altman and mira muratti. source: the new york times

hours after ilya made the announcement, jan leike, one of the inventors of rlhf and co-director of the super alignment team, followed in his footsteps and left, again adding more uncertainty to openai's future.

in august, john schulman, co-founder and research scientist at openai, revealed his departure and joined anthropic to focus on in-depth research on ai alignment. he explained that his departure was to focus on ai alignment and technical work, not because openai did not support alignment research. schulman thanked his colleagues at openai and was "confident" in its future development.

anthropic was founded by dario amodei, openai's vice president of research who left in 2020, and daniela amodei, who was then the vice president of security and policy.

brockman also announced a one-year sabbatical that same month, his “first extended sabbatical” since co-founding openai nine years ago.

on september 10, alexis conneau, who led openai's gpt-4o and gpt-5 model audio interaction research, announced his resignation to start his own business. conneau's research was dedicated to achieving the natural voice interaction experience shown in the movie "her", but the release of related products has been repeatedly delayed.

since its inception, openai has attracted much attention for its dual identities of non-profit and commercialization. as the commercialization process accelerates, internal tensions about its non-profit mission have become increasingly apparent, which is also a reason for the loss of team members. at the same time, elon musk’s recent lawsuit may also be related to the loss of personnel.

openai researcher daniel kokotajlo said in an exclusive interview with the media after leaving the company that in the "palace fight" incident last year, altman was briefly fired and then quickly reinstated, and three board members who focused on agi safety were replaced. "this allowed altman and brockman to further consolidate their power, while those who focused on agi safety were marginalized. (altman) they deviated from the company's plan for 2022."

in addition, openai faces an estimated loss of up to $5 billion and operating costs of up to $8.5 billion, most of which are server rental and training costs. to cope with the high operating pressure, openai is seeking a new round of financing with a valuation of more than $100 billion. potential investors such as microsoft, apple and nvidia have expressed interest. company executives are seeking investment worldwide to support its rapidly growing funding needs.

in order to ease financial pressure, openai is seeking a new round of financing. according to a report by the new york times on the 11th, openai also hoped to raise about $1 billion at a valuation of $100 billion last week. however, because the computing power required to build a large-scale ai system will lead to greater expenses, the company recently decided to increase the financing amount to $6.5 billion.

however, foreign media quoted insiders and undisclosed internal financial data analysis, saying that openai may face a huge loss of up to $5 billion this year, and the total operating cost is expected to reach $8.5 billion. the cost of renting servers from microsoft is as high as $4 billion, and the cost of data training is $3 billion. as more advanced models such as strawberry and orion have higher operating costs, the company's financial pressure has further increased.

(cover image source: openai)

news

openai releases new o1 model: it will be as "thoughtful" as humans

introduction

my contact information