o1 gold medal team reveals the amazing moment when ai surpasses humans! 22-minute full video is fully released

2024-09-22

new intelligence report

editor: taozi qiao yang

【new wisdom guidethe birth of o1 is the most revolutionary moment for the openai team. in the 22-minute full interview video, they shared their thoughts on the new model and the development story behind it.

the full video of the openai o1 team interview is finally online!

the 22-minute event was filled with many “aha!” moments shared by the o1 r&d team, led by project lead bob mcgrew.

some people mentioned that the new o1 model is equivalent to a combination of multiple doctors, and often performs better than humans. others said that after the release of o1, they clearly felt the arrival of agi.

"when models outperform humans in areas such as mathematics, coding, go, and chess, the future of agi becomes clearer."

nathan lambert, a scientist at the allen institute, gave a wonderful summary of the highlights of this video.

there are 8 points in total:

1. o1, powered by reinforcement learning, is better than humans at discovering new cot reasoning steps

2 the emergence of self-criticism is the most powerful moment of o1

3 let o1 finish the answer before the "timeout", and then suddenly have an "aha" moment

4 challenges of scaling parameters and continuing along the path of reinforcement learning algorithm progress

5 many people mentioned how important infrastructure is compared to algorithms

6 through planning and error correction, o1 can solve new problems in the world

7. the new training paradigm is a completely new approach that puts more computing power into the model

8 o1 when writing code, when it outputs code to be used, it needs to pass unit testing

next, let’s take a closer look at the story behind the o1 model.

strengthen learning + thinking, o1 opens a new paradigm

as a new series of openai, o1 is most different from the gpt model in terms of reasoning.

it is essentially a reasoning model, which means it will "think" more than before.

in the view of openai researchers, "thinking" is the most intuitive way of reasoning.

sometimes, when we are asked what the capital of italy is, we can come up with an answer almost instantly without thinking, but sometimes, when it comes to business plans, novels, etc., it takes a long thinking process.

needless to say, the longer you think, the better the results.

therefore, reasoning is the ability to convert thinking time into optimal results.

in the words of mark chen, reasoning is a "primitive" that is necessary to achieve any reliable thinking process.

openai actually started research on reasoning very early on. in the early days of its establishment, they saw the potential of alphago to defeat humans through rl algorithms and conducted a lot of research.

for example, in 2016, they opened the game testing platform "universe", which is an open source platform for training ai's general intelligence level.

in 2018, a gaming ai called openai five was created, which successfully defeated the og team, the world champion of two dota2 international invitational tournaments.

at the same time, significant scaling progress has been made in the fields of data and robotics.

the openai team began to think: how to achieve reinforcement learning in general fields and realize a very powerful ai?

that is the new paradigm opened by the gpt series. it has achieved amazing results in expanding unsupervised learning.

and, since then, researchers have begun exploring how to combine these two paradigms - reinforcement learning and unsupervised learning.

the researchers say it's hard to say exactly when the effort began, but it has been going on for a long time.

the “ahaha” moment

in the video, someone said that he thought the coolest thing about research was the "aha" moment.

at a certain point in time, an unexpected breakthrough in research occurs, and everything suddenly becomes clear, as if an epiphany has occurred.

so, what were the “aha” moments that each team member experienced?

one said that he felt there was a critical moment in the process of training the model when they invested more computing power than before and generated a very coherent cot for the first time.

at this moment, everyone was pleasantly surprised: it was obvious that this model was significantly different from the previous ones.

others said that when considering training a model with reasoning capabilities, the first thing that comes to mind is to let humans record their thinking process and train accordingly.

for him, the aha moment was when he discovered that the cot generated and optimized by training the model through reinforcement learning was even better than the cot written by humans.

this moment demonstrates that we can extend and explore the reasoning capabilities of our model in this way.

the researcher said he has been working hard to improve the model's ability to solve mathematical problems.

what frustrates him is that after each result generated, the model never seems to question what it did wrong.

however, when they trained one of the early o1 models, they were surprised to find that the model's scores on math tests suddenly improved significantly.

moreover, researchers can see the model’s research process—it begins to reflect on itself and question itself.

he exclaimed: we finally made something different!

the feeling was so strong that at that moment, it seemed as if everything came together.

another researcher said that when you ask the model to complete thinking before "timeout", the process is very interesting.

it's like participating in a math competition, any thinking has a time limit.

he said that this was the main reason why he entered the field of ai, and now, for him, it can be regarded as a "closed loop" moment.

in addition, what is amazing about the o1 model is that it has been of great help in promoting scientific discoveries and engineering progress.

for many people, agi seems to be a very abstract and unattainable concept. only when they see with their own eyes that ai does better than humans in things they are good at, can they believe in the arrival of agi.

for professional chess and go players, ibm's deep blue, as well as deepmind alphago and alphazero, made them realize this years ago.

for openai's group of scientists who are good at mathematics and coding, the o1 model has a similar meaning. what's more interesting is that their work is equivalent to personally creating an ai that can crush their own abilities.

what difficulties did you encounter during the project?

regarding the obstacles encountered in the process, the researchers directly stated that training llm is fundamentally difficult.

it is similar to launching a rocket from the earth to the moon. there is only a narrow path to success, but there are countless paths to failure. a slight deviation from the angle will prevent you from reaching your goal.

there are thousands of ways that the training process can go wrong, and even in the hands of this group of talented research scientists, each round of training will encounter hundreds of problems.

furthermore, as models become increasingly intelligent, such as o1, which is equivalent to humans with several phds, evaluation becomes increasingly difficult.

sometimes, they need to spend a long time to determine whether the model is doing the right thing, and finally many commonly used industry benchmarks become saturated, and they need to find a new benchmark suitable for o1's capabilities.

in addition to the model development process, researchers were asked about their favorite use cases of the o1 model.

hyung won chung said that o1 can be a good coding assistant.

he usually follows the tdd (test-driven development) development method at work. with the help of o1, he can avoid the work of writing unit tests himself, and instead directly specify the requirements and let the model be written automatically.

in addition, the error information you encounter can also be thrown directly to o1. although it sometimes cannot directly solve the problem, it can ask a better question than the compiler and help you solve the error.

jason wei said that he often uses o1 as a brainstorming partner, and the range of issues that can be discussed is quite wide, ranging from how to solve a machine learning problem to how to draft a blog or tweet.

a blog he wrote in may this year on llm assessment drew on o1's opinions, such as the structure of the article, the advantages and disadvantages of various assessment benchmarks, and writing style.

what is it like to work at openai?

regarding this issue, many people talked about everyone's intelligence and the harmonious team atmosphere.

for example, a code that you have been debugging for a week is instantly solved by a colleague who passes by; being with extremely smart colleagues every day makes you gradually become humble.

mark chen described the strawberry project as a very "organic" project, because everyone has their own views and opinions on professional issues, and they all have ideas that they are passionate about promoting.

when these ideas come together, sparks emerge and they snowball.

however, the other side of having a strong opinion is that everyone is very persistent in their own views, but not stubborn. if they see objective results that refute their own opinions, they will change their minds accordingly.

what is even more admirable is that this group of extremely smart people are also very nice and willing to help others solve problems. colleagues eat and go out together, which makes many researchers interviewed say frankly that "working here is a very good experience."

the story behind o1-mini

the motivation for releasing o1-mini is to provide more researchers with models that have lower budgets but still strong reasoning capabilities.

it can be called a "reasoning expert" and is smarter than openai's best model in the past.

moreover, the cost and latency are very low.

perhaps it may not necessarily know a celebrity by their date of birth, but it does have the ability to reason effectively and a lot of wisdom.

openai researchers said they will further improve the algorithm to make it comparable to the best small models.

in addition, researchers around the world have been investing more computing and hardware, causing model costs to drop exponentially over a long period of time.

one flaw, however, was that we didn't spend more time looking for a new way to turn the situation around.

the new paradigm of o1 is our discovery - reasoning scaling, which can also optimize computing efficiency.

what is the motivation for doing research?

what is the reason that brings this group of "smart brains" together and motivates them to do research?

one researcher said it was fascinating to think about the different ways he could make the model reason.

others said, "good things take time."

o1 can answer so quickly, which is the first step towards a model that can think about problems for a long time. in the future, it will take months or even years of research to move it to the next stage.

“it’s exciting and meaningful to think that a small number of us can have an impact that changes the world.”

the most fascinating point is that the new paradigm unlocks tasks that the model could not accomplish before. it is not just about answering certain queries, but actually generalizing new capabilities through planning and correcting errors.

even more, o1 can generate new knowledge, which is the most exciting part of scientific discovery.

in a short period of time, the model will become an increasingly powerful contributor to its own development, the researchers said.

finally, when the o1 leader asked, "are there any other observations worth mentioning?"

jason wei shared, “an interesting observation is that each trained model is slightly different and has its own quirks, like a piece of handicraft. this uniqueness adds a touch of personality to each model.”

the full video is as follows:

news

o1 gold medal team reveals the amazing moment when ai surpasses humans! 22-minute full video is fully released

introduction

my contact information