news

"programming as a profession ends today", the terrifying thing about openai's new model o1

2024-09-15

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

author: fanfan, editor: odette, title image from: ai generated

remember the big earthquake at openai’s top management?

the project called q* (pronounced q-star) led to the dismissal of sam altman, the resignation of co-founder greg brockman, and the intensification of internal conflicts within openai.

according to people familiar with the matter, the q* project had made significant progress at the time and could already solve basic math problems. unlike calculators, which can only solve a limited number of operations, and unlike gpt-4, which gives a different answer to the same question every time,q* may have the ability to generalize, learn, and understand, which is a key step towards agi.openai researchers wrote a letter to the board of directors warning that q*'s major discovery could threaten all of humanity, and that sam altman concealed this.

there was a huge upheaval within openai, but openai itself never directly responded to the existence of q*.

today, openai suddenly released a new model, which is still a preview version. it is the legendary q*, later code-named "strawberry" and now openai o1-preview.

a new reasoning model for solving complex problems, which is not the same as chatgpt|openai

o, or "omini", the all-encompassing o, but according to openai, this model "represents a new height of artificial intelligence" and is very different from the previous large models in working method, so it can be established as a new series and start again from 1 (gpt5: i'm old!).

as for whether this model will "threaten humanity" as ilya sutskever and other former core scientists of openai who have turned against the company have judged, and will push humanity into the agi (general artificial intelligence) era without perfect moral constraints, you can read the article and make your own judgment.

o1, outperforms everything

the first is the familiar running points segment.

every generation of large models that comes out will reach unprecedented new heights, but this time the o1 is fundamentally different.

most of the popular large models currently appear in the form of chatbots, whose thinking paths are difficult to explain, and whose development direction is multimodal (able to speak, see, and listen), becoming more and more human-like in tone and response. o1 is different from them.

first of all, its goal is not to get faster and faster, but even to get slower and slower.

openai scientist noam brown said that currently o1 can give an answer in a few seconds, but in the future it will be able to think for hours, days, or even weeks. he then attached a picture showing o1 making a diagnosis for a case after thinking for more than ten seconds. noam brown's implication is,a long reasoning time means that the model can build a longer chain of thoughts and conduct deeper thinking.

secondly, o1 breaks through the achilles' heel of previous large language models: mathematics.

aime, the american invitational mathematics competition, is easier than the olympiad but much harder than the sat. it is generally used to select the best high school students in mathematics in the united states. gpt4-o only scored 12 points when asked to answer the questions of the invitational competition, but o1 scored 74 points in one go. if the sample is taken 1,000 times and the scoring function is reordered for the 1,000 samples (this better reflects the expected level of the model), o1 will score 93 points, which can rank among the top 500 in the united states and can enter the american mathematical olympiad.

comparison of the performance of o1 and gpt-4o, great progress in mathematics|openai

o1 was asked to write questions for the 2024 international olympiad in informatics (ioi). within 10 hours, with a maximum of 50 submissions allowed for each question, it achieved a score of 213, ranking in the top 49% among human contestants.if the number of submissions is increased to 10,000, o1 can get 362.14 points, which is enough to get the ioi gold medal and be admitted to tsinghua university.

in actual testing, a fine-tuned version of o1 was used, not the forward-looking version we can use|openai

there are also many other tests. for example, in gpqa (a comprehensive intelligence test of physics, chemistry and biology), o1s outperform phds in related fields in some questions.

in short,o1’s goal has long been to expand in areas where it is already strong, but to achieve breakthroughs in complex logic that large language models are not good at.

one step back, two steps forward

as mentioned above, o1's reaction speed has slowed down.

it spends more time thinking before it reacts, and it keeps refining its thought process, trying different strategies, and learning from its mistakes. that’s scary.

moreover, o1 is not a multimodal model now. openai spent two years making the large model able to see and hear, but now it has returned to its original simplicity. o1 can only accept character input.

slowing down and becoming monotonous, for o1,it's one step back and two steps forward. people who have used o1 say it's the smartest model they've ever used, and that conversations with it have gone beyond the scope of previous small talk.

in one test, the user asked o1 a logical paradox question: "how many words are there in the answer to this question?"

o1 thought for ten seconds and showed his thinking process. first, he thought that this was a self-referential paradox, or a recursive problem. if the answer is not determined, the number of words in the answer cannot be determined.“avoiding unnecessary phrasing is important for clarity and brevity of answers”the next step is to count the words, which requires matching the number of words in the sentence. then it lists a lot of sentences and finds the most suitable matching option. it finds that "this has five words" has five words, so after changing the sentence structure to a complete answer, five should be replaced by seven.

so it replied: "the answer is seven words."

this reasoning process is not much different from my reasoning process|x

in another example, o1 answered the simple question “how many r’s are there in straberry?” in 5.6 seconds and 631 tokens.

from the above example, we can seethe working method of o1 is fundamentally different from chatgpt.now o1 has added reasoning tokens, which will split a problem into multiple steps, think about them separately, and then remove the reasoning tokens to generate the answer.

the following diagram shows how the chain of thought works, which also explains why o1's response speed has slowed down.

when using o1, you may want to test its ability with some classic logic and math problems.

perhaps when answering simple questions, the difference between multiple rounds of reasoning and not performing multiple rounds of reasoning is not obvious, but if it is used to solve complex problems in writing code, doing math problems, and in the field of science, this kind of thinking ability is essential.

openai said in the paper that now, medical personnel can use o1 to annotate cell sequencing data, physicists can use o1 to generate the complex mathematical formulas required for quantum optics, and developers in various fields can use o1 to build and execute multi-step workflows.

more importantly,this is a prototype of a thinking pattern and an early form of wisdom.

new model, new habits

since o1 works differently from chatgpt, the tutorials you've seen before that teach you how to write prompts are no longer applicable - in the current situation, too much description will only consume a large number of tokens and will not necessarily lead to better results.

to make this clear to all users, openai has written a new token guide. in the guide, openai explains that the best prompts in o1 are direct and concise. instructing the model to do it step by step or giving several scattered prompts may be counterproductive. here are a few official suggestions:

keep prompts brief and direct. models respond best to short, clear instructions that don’t require excessive coaching.

avoid chains of thought in prompts. o1 will do its own internal reasoning, so it is useless to guide it step by step or explain your thought path.

it is best to use separators to improve clarity. use separators such as "", <>, §, etc. to clearly distinguish different parts of the prompts to help the model process questions in batches.

limit the extra context in retrieval enhancement generation. provide only the most relevant information to avoid overthinking the model.

when i saw the third one, i felt a little familiar with this format. in the future, programmers will most likely use natural language to program. the basic instructions are still the same, but they have become plain language. according to the latest guidelines, a good prompt will look like this:

or something like this:

§host§writer§bar owner§oil painter§leatherworker§silversmith§singer§hand drummer§backpacker§golden left face§french knight§zen disciple§

the rest is up to the model to figure out.

give me a minute to make a 3d snake

there is a reason to use the example of snake. less than a day after the release of o1, many experiments were made with it, including 3d snake.

@ammaar reshi on x used extremely simple prompts and wrote a 3d snake in just one minute, and o1 also taught him how to use the code step by step.

have you learned how to write prompts? |@ammaar reshi

although the effect is a bit crude, no one can say that it is not a snake.

and it’s quite fun|@ammaar reshi

netizen @james wade used o1 to make a data analysis app that can display a brief description and example of each distribution. it only took 15 minutes, including the deployment time. he said: i have never thought of making something like this before. it was too troublesome before.

the effect is as shown in the picture|@james wade

another full-stack engineer with 16 years of experience, @dallas lones, made a react native full-stack development app in just a few minutes and said with emotion,i didn't start my own business sooner, and now this craft has become the tears of the times.he said,"programming as a profession officially ends today."

there are more people challenging the limits of o1, and some people have already started playing"see whose questions are the most tricky and can make o1 think the longest."game.

currently, o1 is open to chatgpt plus and team users, while api access will first be open to level 5 users who spend more than $1,000 on the openai api. next, openai will gradually open the low-profile version of o1-mini to free users.

will this be the sunset of humanity?