interpretation | xiao yanghua: what impact will the o1 model with mathematical and scientific abilities reaching the doctoral level bring?

interpretation | xiao yanghua: what impact will the o1 model, which has mathematical and scientific capabilities at the doctoral level, bring?

2024-09-13

expert-level reasoning ability is not a tactic of solving a large number of questions, but requires strong thinking ability. the difficulty of training large-scale model reasoning ability lies in the fact that a large number of human thinking processes are never expressed, so the data of thinking processes are extremely scarce. he speculated that openai should have used a large amount of synthetic data this time.

the speed at which humans improve their understanding of artificial intelligence is increasingly difficult to keep up with the speed of its development. this is a huge governance challenge. humans are the ones who unlock the magic of artificial intelligence. if artificial intelligence has superhuman capabilities, it is likely that humans will not be able to activate its superpowers because this is beyond the level of human cognition.

the new model o1 launched by openai has powerful reasoning capabilities. visual china image

on september 12th local time, openai launched a new generation of model series o1. compared with previous models, this model has demonstrated strong reasoning ability. when dealing with benchmark tasks in physics, chemistry and biology, its performance is similar to that of a doctoral student. this ability is exactly what the previous models did not have.

on september 13, xiao yanghua, professor and doctoral supervisor at the school of computer science and technology of fudan university and director of the shanghai key laboratory of data science, said in an interview with the paper that the emergence of the o1 model means that the reasoning ability of large models can fully reach expert level, which is a milestone in artificial intelligence and will greatly enhance the application of models in enterprises.

but he also admitted that as the model's abilities in intellect, sensibility and rationality continue to improve, they will surpass human capabilities. it is still difficult to predict what impact artificial intelligence will have on humans in the future. "the development speed of artificial intelligence now exceeds the speed of human cognition of it. artificial intelligence governance will be a huge challenge." xiao yanghua said.

good at reasoning complex tasks, performance similar to that of a phd

as an early model, the new reasoning model o1 does not yet fully possess the functions that chatgpt has, such as browsing network information, uploading files and images, etc. however, openai said that this is a major improvement for complex reasoning tasks and represents a new level of artificial intelligence capabilities.

"through training, they learn to refine their thought processes, try different strategies, and recognize their mistakes." large-scale reinforcement learning algorithms teach the model how to think effectively using its thought chain during training, and the o1 model can generate a long internal thought chain before responding to the user. with more reinforcement learning and more thinking time, o1's performance continues to improve. it learns to break down tricky steps into simpler ones. when the current method doesn't work, it tries a different approach.

the new reasoning model o1 performs well in mathematics and programming, and is good at accurately generating and debugging complex code. openai evaluated the model's math performance in aime (american invitational mathematics competition). in the 2024 aime exam, gpt-4o only solved 12% (1.8/15) of the questions on average, and for each single sample of the question, o1 averaged 74% (11.1/15). in the qualifying exam for the international mathematical olympiad (imo), gpt-4o could only correctly solve 13% of the questions, while the new reasoning model scored 83%.

the new model can reason about complex tasks and performs similarly to phd students when processing benchmark tasks in physics, chemistry, and biology. openai conducted gpqa diamond benchmark tests in chemistry, physics, and biology. in order to compare the model with humans, experts with phds were recruited to answer questions together.

"we found that o1 outperformed those human experts, becoming the first model to do so on this benchmark. these results do not mean that o1 is more capable than phds in all aspects, just that the model is more proficient in solving some of the problems that phds need to solve." openai said that healthcare researchers can use o1 to annotate cell sequencing data, physicists can use o1 to generate the complex mathematical formulas required for quantum optics, and developers in all fields can use o1 to build and execute multi-step workflows.

milestone-level reasoning capabilities will greatly improve application effects

"the previous large language model was more like a liberal arts student, and it was still far from the level of a science student. but the core ability of human intelligence is thinking and thinking, and openai's new reasoning model o1 series shows the human thinking process." xiao yanghua said that the essence of the new reasoning model o1 is still a large language model, but it fully taps the potential of the large model. in the past, the generation ability of the large model was determined by the corpus, just like "reading 300 tang poems, you can recite them even if you can't write poems." but expert-level reasoning ability is not a tactic of solving a large number of questions, and requires strong thinking ability. the difficulty of training large model reasoning ability lies in the fact that a large number of human thinking processes are never expressed, so the data of the thinking process is extremely scarce. he speculated that openai should have used a large amount of synthetic data this time.

"openai has a clear first-mover advantage, its base model is stronger, it collects more thinking process data, screens and synthesizes a large amount of high-quality thinking data, and has strong evaluation capabilities. which reasoning processes are correct and which are wrong requires the use of reinforcement learning. reinforcement learning is essentially a process of exploration and trial and error. if it doesn't work, try another method." xiao yanghua said that with the help of these technologies and data, openai has made the big model a true science student and reached expert level.

chen yunwen, chairman of daguan data, said that previous models were unable to calculate complex advanced mathematical problems, and o1 has enhanced mathematical and reasoning capabilities, which is a great improvement. however, the improvement of mathematical capabilities does not mean that the research and development paradigm of large models has undergone a fundamental change, but only targeted improvements to previous shortcomings.

in xiao yanghua's view, the emergence of o1 was not unexpected. "in fact, we judged very early that large models would have stronger emotional abilities and stronger rational abilities. the surprise was that we didn't expect to see it so soon, and the effect was so amazing." he believes that in the future openai may differentiate into many large models that are good at doing different things based on the general large model.

for example, the previous versions of gpt-4 were familiar with all knowledge and facts, emphasizing intellectual ability; gpt-4o focused on multimodal interaction and emphasized emotional ability; the o1 series focused on thinking and emphasized rational ability. the improvement of the rational ability of the model will bring great development to the to b industry. "the biggest pain and bottleneck of to b lies in the reasoning ability of large models. the emergence of the new reasoning model o1 series means that many problems in the to b industry can be greatly alleviated in the future."

challenges brought by the rapid development of artificial intelligence

"openai is really amazing. although openai's technical route has not exceeded the scope of cognition so far, we all know the development direction of large models, including multimodality and improving reasoning capabilities, but only openai has quickly turned it into reality. they train large models in the same way as they train humans. they have very strong ideas about human intellectual development and cognitive development, and have a very clear understanding of human growth and evolution. so far, they have not found any wrong steps." xiao yanghua said.

openai has an obvious first-mover advantage. for the development of domestic large models, "openai's advantages are disadvantages for us. we need to calm down and catch up slowly. there is only the first, not the second in the general artificial intelligence track." but in the long run, xiao yanghua said that the improvement of the single ability of large models also has a ceiling. because the real original data of humans is limited and the speed of generation is slow. "at present, openai uses human data to synthesize new data to enhance reasoning ability. but synthetic data is limited by the original data, and it cannot synthesize infinite data, nor can it obtain essentially novel data. it cannot invent new disciplines and propose new theories like einstein." in terms of hardware, reasoning requires less computing power than training, but due to the extension of the thinking chain, the requirements for reasoning efficiency become higher, which puts higher requirements on the accelerated optimization of the reasoning process.

however, as big models improve in many aspects of their capabilities, they have brought challenges to governance. the challenge is that the speed at which human understanding of them improves is slower than the speed of their development.

philosopher kant divided the human cognitive process into three stages: sensibility, intellect, and rationality. now, the sensibility, intellect, and rationality of the big model are all improving, and are likely to surpass humans. few people are strong in all three types of cognition.

"at present, o1 has reached the level of a doctoral student. it will only be a process of quantitative change to reach the level of a scientist in the future. humans will gradually fall into a cognitive blind spot in the development of artificial intelligence. for example, what does the current reasoning ability of large models mean? the proportion of people who can truly reach the ai knowledge level will only get smaller and smaller. almost no one in the world can reach the doctoral level in mathematics, physics, chemistry or mathematical olympiad. how many of us can understand, recognize and control ai?" xiao yanghua said that humans currently lack the basic cognitive framework of artificial intelligence. this is a huge governance challenge. topics such as employment, economy, ethics, and social relations will cause extensive discussion. "humans are the unlockers of artificial intelligence magic. if artificial intelligence has superhuman capabilities, it is very likely that humans will not be able to activate its superpowers because this is beyond the cognitive level of humans themselves."

the paper reporter zhang jing

(this article is from the paper. for more original information, please download the "the paper" app)

report/feedback

news

interpretation | xiao yanghua: what impact will the o1 model, which has mathematical and scientific capabilities at the doctoral level, bring?

introduction

my contact information