news

big news! openai's reasoning model that can "think about problem-solving logic" is launched, and its cognition will leap to the "level of a doctoral student in science"

2024-09-13

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

at around 1 a.m. beijing time on friday, the ai ​​era ushered in a new starting point - large models capable of general and complex reasoning finally came to the fore.

image source: visual china-vcg31n2008743681

openai released an announcement on its official website:started pushing the openai o1 preview model to all subscribers - the previously widely anticipated "strawberry" modelopenai said that for complex reasoning tasks,new model represents a new level of ai capability, so it is worth resetting the count to 1 and giving it a new name that is different from the "gpt-4" series.

the characteristics of the large inference model areai will spend more time thinking before answering, just like humans think about problem solvingthe logic behind previous large models is to predict the sequence of word generation by learning patterns in a large number of data sets, which strictly speaking does not really understand the question.

as the first versions of the o1 series models, openai only launched the o1-preview preview version and the o1-mini mini version, and they were launched in stages to paying users, free users and developers, and the price for developers is quite expensive.

the o1 model costs at least three times as much as gpt-4o and is trained using a completely new method

according to reports, the new o1 model can answer more complex programming, mathematical and scientific problems through a brand-new training method. it will "think" before giving an answer, and it is faster than humans. the smaller and cheaper mini version focuses on programming use cases.

chatgpt plus and team paid users can access both models immediately, manually selecting them from the drop-down menu of the ai ​​model selector in the user interface. chatgpt enterprise and edu users will be able to use these two modes next week, and access to o1-mini will be provided to all free users at an unknown time in the future. openai hopes to automatically select the correct model based on the prompt in the future.

however, it is very expensive for developers to access o1. in the api (application programming interface), o1-preview charges $15 per 1 million input tokens, which is three times the cost of gpt-4o, and $60 per 1 million output tokens, which is four times the cost of gpt-4o. 1 million tokens is the size of the text block parsed by the model, which is equivalent to about 750,000 words.

jerry tworek, head of research at openai, told the media:o1the training method behind it is fundamentally different from previous models.

first, o1 was “trained using a novel optimization algorithm and a new training dataset specifically tailored for it,” which included “inference data” and scientific literature specifically tailored for it.

secondly, the previous gpt model training method is to imitate the rules/patterns of the data set, while o1 adopts the "reinforcement learning" method, which teaches the model to solve problems by itself through rewards and punishments, and then uses the "chain of thoughts" to handle user queries and provide a summary of the chain of thoughts, which is similar to the way humans deal with problems step by step.

in the right picture, you can click on the thought chain to see how the o1 model "thinks"

a diagram showing the chain of ideas for a complex mathematical problem

openai believes that this new training method will make the o1 model more accurate and reduce the problem of "hallucination" of making up answers, but it cannot completely eliminate the "hallucination". the main difference between the new model and gpt-4o is that it can better solve complex problems such as programming and mathematics, while also improving its reasoning process, trying different strategies, and identifying and correcting errors in its own answers.

cognition will leap to the "level of a doctoral student in science"

openai once explained that gpt-4, which will be released in 2023, is similar to the intelligence level of a high school student, while gpt-5 will complete the growth of ai from "high school student to doctorate". this o1 model is a key step in this process.

compared with existing large models such as gpt-4o, openai o1 can solve more difficult reasoning problems while improving the mechanistic defects in previous models.

for example, the new model can count how many "r"s there are in strawberry.

at the same time, ai will be more organized when answering programming questions.before writing code, think through the entire answer process, and then start outputting the code.

for example, in a poem-writing task with preset conditions (e.g. the last word of the second sentence needs to end with i), the "pick up a pen and write" gpt-4o does give an answer, but it often only meets part of the conditions and does not self-correct. this means that the ai ​​must hit the correct answer the first time it generates, otherwise it will definitely make mistakes. but in the o1 model, the ai ​​will continue to try and polish the answer, thereby significantly improving the accuracy and quality of the generated results.

interestingly, when you click on the ai ​​thinking process, the ai ​​will say "i am thinking about whether this is okay or not", "ah, time is running out, i need to give an answer as soon as possible", etc. openai confirmed that what is displayed here is not the original chain of thought, but a "summary generated by the model", and the company also frankly admitted that there are factors here to maintain a "competitive advantage".

jerry tworek, head of research at openai, revealed that the training behind the o1 model is fundamentally different from previous products.while previous gpt models were designed to mimic patterns in their training data, o1’s training is designed to let it solve problems independently. in the process of reinforcement learning, rewards and punishment mechanisms are used to “educate” ai to use “thinking chains” to deal with problems, just like how humans learn to disassemble and analyze problems.

according to the test,the o1 model was able to score 83% in the international mathematical olympiad qualifying exam, while gpt-4o could only correctly solve 13% of the problems.in the programming ability competition codeforces, the o1 model scored 89% percentile, while gpt-4o only scored 11%.

openai said that according to the test, in the next updated version,ai performs at phd level on challenging benchmarks in physics, chemistry and biology

disadvantages: cannot browse live web pages, cannot upload files and images, lacks extensive world knowledge, or is more prone to hallucinations

however, as the initial version of the o1 model, the o1-preview version released today also has obvious shortcomings. for example, it is only a "text-only version" and cannot browse web information or upload files and pictures for the time being. in other words, it does not have many usage functions of chatgpt and is not as powerful as gpt-4o in many common use cases. in addition, it has usage restrictions, with the o1 preview version capped at 30 messages per week and the mini version capped at 50 messages per week.

other limitations mentioned include: the o1 model is not as capable as gpt-4o in many areas and performs poorly in factual knowledge about the world; its reasoning ability is slow in some use cases and it may take longer to answer questions; currently o1 is only a pure text model and lacks the ability to reason about specific documents or collect real-time information from the internet.

in addition, getting ai models to play tic-tac-toe has always been considered a difficult problem in the industry. the new o1 model with reasoning capabilities still makes mistakes in this game, which means that the technical difficulties cannot be completely overcome.

openai also acknowledged in a technical paper that it had received some “anecdotal feedback” that the o1 preview and mini versions were more prone to “hallucinations” than gpt-4o and its mini version, meaning that the ai ​​was still confidently making up answers, and that o1 rarely admitted that it didn’t know the answer to a question.

well-known technology media techcrunch pointed out that openai pointed out in a blog post related to the o1 model that it decided not to show users the original "thinking chain" of this new model, but chose to give a summary of the thinking chain in the answer. the purpose is to maintain "competitive advantage" and to make up for possible shortcomings. "we strive to teach the model to reproduce any useful ideas in the thinking chain in the answer."

daily economic news comprehensive public information

daily economic news

report/feedback