openai o1 shows self-awareness? terence tao was shocked by the actual test, and mensa's iq of 100 topped the model list

2024-09-14

openai o1 took first place in the iq test!

the big guy maxim lott conducted iq tests on o1, claude-3 opus, gemini, gpt-4, grok-2, llama-3.1, etc. the results showed that o1 ranked first.

following closely behind are claude-3 opus and bing copilot, which took second and third place respectively.

please note that this set of iq test questions is an offline iq test for mensa members and is not included in any ai training data, so the results are very reference-only.

the famous mathematician terence tao also conducted actual tests on o1 and found that after he asked the model an ambiguous mathematical problem, it was able to successfully identify cramer's theorem.

what’s even more coincidental is that just after the release of o1, mark chen, openai’s vice president of research, expressed the view that today’s large neural networks may have enough computing power to show some consciousness in tests.

there is now a long list of industry leaders who believe that ai has consciousness, including but not limited to:

geoffrey hinton (godfather of artificial intelligence, the most cited ai scientist)

ilya sutskever (3rd most cited ai scientist)

Andrej Karpathy

today, many in the industry believe that ai has consciousness and are waiting for the overton window to open further so that the public is willing to accept it.

some even predict that ai will be conscious in 2024/2025, as the behavior of today’s models already clearly demonstrates perception.

some netizens have discovered that o1 is not only strong in empirical stem subjects, it can even hypothesize a completely new theory of consciousness.

some people think that the small step that o1 has taken towards the infinite reasoning model already possesses the rudiments of consciousness.

terence tao: o1 can actually recognize cramer's theorem

in actual testing, terence tao discovered that the o1 model has stronger mathematical performance!

first, he posed a vaguely worded mathematical problem that could be solved if one searched the literature and found the right theorem, cramer's theorem.

in previous experiments, gpt was able to mention some related concepts, but the details were all fabricated and meaningless.

this time, o1 successfully identified cramer's theorem and gave a satisfactory answer.

full answer: https://shorturl.at/wwru2

in the following example, the problem posed is a more challenging complex function analysis, and the result is also better than the previous gpt series models.

with plenty of prompts and guidance, o1 was able to produce correct and well-articulated solutions, but its shortcomings were that it was unable to generate key conceptual ideas on its own and made obvious mistakes.

tao described the experience as being roughly equivalent to mentoring a graduate student of average ability but able to do some of the work, whereas gpt gives the impression of being a student who is completely incapable of doing the work.

it may only take one or two iterations, coupled with the integration of other tools, such as computer algebra packages and proof-aiding tools, to transform the o1 model into a "competent graduate student" who can then play an important role in research tasks.

full answer: https://shorturl.at/zrjyk

swipe up and down to view

in the third experiment, tao asked the o1 model to formalize a theorem in the proof auxiliary tool lean. it was necessary to first decompose it into sub-lemmas and give a formal expression, but no proof was required.

the content of the theorem, specifically, is the establishment of one form of the prime number theorem as a corollary of another form.

the experimental results are also very good, because the model understands the task and has a reasonable initial decomposition of the problem.

however, probably due to the lack of recent data in the training data for lean and its library of mathematical functions, there are several errors in the generated code.

although there are still some flaws, the results of this experiment have already allowed people to foresee the practical application of o1 in mathematical research.

similar models, if fine-tuned for lean and mathlib and integrated into integrated development environments (ides), would be extremely useful in formal projects.

in many previous speeches, terence tao has repeatedly emphasized the application of ai tools in the formalization of theorems. it seems that the great man's prediction will come true again.

full answer: https://shorturl.at/ogtjt

computer professor uses animation to reveal: how can o1 spend more time thinking?

what important breakthroughs have been made in the process of learning to think longer with cot, which has led to the critical improvement? at present, we can only make some guesses based on the existing information.

for example, based on existing information and his own understanding, tom yeh, a professor of computer science at the university of colorado boulder, specially made an animation to explain how openai trained the o1 model to spend more time thinking.

regarding training, there is a very brief sentence in the report:

“through reinforcement learning, o1 learned to hone its thought chain and improve its strategy.”

the two key words in this sentence are: reinforcement learning (rl) and chain of thought (cot).

in rlhf+cot, the cot token is also input into the reward model to obtain a score to update the llm for better alignment; while in traditional rlhf, the input only contains the prompt word and the model response.

during the inference phase, the model learns to generate the cot token first (which can take up to 30 seconds) before starting to generate the final response. this is how the model spends more time “thinking”.

among the contributors listed in the report, two are worth noting:

ilya sutskever, the inventor of reinforcement learning with human feedback (rlhf), his name appears to indicate that rlhf is still used when training the o1 model.

jason wei, author of the famous thought chain paper. he left google brain to join openai last year. his presence means that cot is now an important part of the rlhf alignment process.

however, there are many important technical details that openai did not disclose, such as how the reward model is trained and how to obtain human preferences for "thinking process".

disclaimer: the animation only represents the professor's reasonable speculation and is not guaranteed to be accurate.

teams share celebration videos and "aha" moments

the video below gives us more clues about the moment when an important breakthrough was made in the research.

after releasing the o1 model, the team released a video produced by the team behind it.

news

openai o1 shows self-awareness? terence tao was shocked by the actual test, and mensa's iq of 100 topped the model list

introduction

my contact information