news

o1 complete thought chain has become the number one taboo of openai! ask too many questions and wait for your account to be blocked

2024-09-14

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

warning! don’t ask how the latest o1 model thinks in chatgpt——

just a few tries and openai will send an emailthreaten to revoke your access

please cease this activity and ensure your use of chatgpt complies with our terms of use. violations of this may result in loss of openai o1 access.

less than 24 hours after the new large-scale model paradigm o1 came out, many users reported receiving this warning email, causing dissatisfaction among the public.

some people have reported that they will receive a warning if the prompt contains keywords such as "reasoning trace" and "show your chain of thought".

even completely avoiding keywords and using other means to induce the model to bypass restrictions will be detected.

some people also claimed that their accounts were actually blocked for a week.

these users are trying to get o1 to repeat what he said.complete internal thought process, that is, all the original reasoning tokens.

currently, everyone can expand the chatgpt interface through the buttonwhat you can see, is just a copy of the original thought processsummary

in fact, when o1 was released, openai gave reasons for hiding the complete thought process of the model.

to summarize: openai needs to monitor the model’s thought process internally, so security restrictions cannot be added to these raw tokens, making it inconvenient for users to see them.

however, not everyone agrees with this reason.

some people pointed out thato1the thought process is the best training data for other models, so openai does not want these valuable data to be stolen by other companies.

some people also think that this shows that o1 really has no moat, and once the thinking process is exposed, it can be easily copied by others.

and “does this mean we just have to blindly trust ai’s answers without any explanation?”

very little was revealed about the technical principles behind the o1 model, with the only useful information being that it “used reinforcement learning.”

in short, openai is becoming less and less open.

o1 is a strawberry, but not gpt-5

it is now certain that o1 is what openai has been hyping for a long time."strawberry", or in other words, using the method represented by "strawberry".

but can it be considered the next generation model gpt-5, or just gpt-4.x?

more and more people are beginning to suspect that it is just an engineering adjustment based on gpt-4o.

the well-known whistleblower account flowers (formerly flowers from the future) said,openai employees internally call o1 "4o with reasoning"

andhe claimed that many openai employees silently liked this revelation, the screenshot above is also from openai employees.

but musk changed his twitter page a while ago so that no one except the original poster could see who liked what, so this news cannot be confirmed at the moment.

flowers also asked a follow-up question in the "ask me anything" event just held by the openai developer account.

openai employees answered many questions here, but avoided this one that got a lot of likes and was ranked at the top.

even ultraman benman just appeared as a riddler again, suggesting that "strawberry" has come to an end and the next code name isorionnew models are on the way.

earlier, there was news that "orion" is openai's next-generation new flagship model, trained by synthetic data generated by "strawberry", also known as o1.

and orion is one of the representatives of the "winter constellations" mentioned by ultraman.

back to the released o1, another criticism surrounding it is“not in compliance with scientific research standards”

for exampleno citation of previous work on inference time computation, but alsolack of comparison with other companies' most advanced models

in response to the previous point, some people pointed out that openai is no longer a research laboratory and should be regarded as a commercial company.

sometimes they still pretend to be a research lab in order to recruit people who want to do research.

however, regarding the latter point, now that the api has been released, it is not up to you whether to compare it with other cutting-edge models, and many third-party benchmarks have already produced results.

the $1 million fund held by the father of kerasAGI Prizein the competition, both o1-preview and o1-mini versions performed well on the public test set.surpassed its own gpt-4o

but o1-previewit was just a tie with claude 3.5-sonnet next door.

focus on promoting in o1coding abilitysuperior,open source pair programming tool aiderthe team ran the test and the o1 series alsono clear advantage

for the entire code rewriting task, o1-preiview scored 79.7 points and claude-3.5-sonnet scored 75.2 points, with o1 leading by 4.5 points.

but for more practical code editing tasks, o1-preview lags behind claude-3.5-sonnet by 2.2 points.

in addition, the aider team pointed out that if you want to use the o1 series to replace claude programming, the cost will be much higher.

partnering with openaidevin "ai programmer"the team has already obtained o1 access qualification in advance.

in their tests, the o1 series powered the base version of devin and achieved significant improvements over gpt-4o.

butthere is still a big gap compared to the released devin production version, mainly due to the fact that the devin production version was trained on proprietary data.

in addition, according to devin's team, o1 usually backtracks and considers different options before arriving at the correct solution, and is less likely to hallucinate or make confident mistakes.

when using o1-preview, devinmore likely to correctly diagnose the root cause of a bug rather than fix the symptoms of the problem

in the more emphasis on mathematics and logical reasoningLivebenchin the list, o1-preview ranks firstcode single item behindin the case ofovertakes claude-3.5-sonnet and opens up a clear gap

the livebench team shared that these are only preliminary results, because many tests have built-in prompts such as "please think step by step", which is not the best way to use o1.

comprehensive evaluation benchmark of chinese large modelssuperclue's chinese complex task high-level reasoning testmiddle,o1-preview's reasoning ability is also far ahead

finally, let's summarize some things to note when using the o1 model:

the cost is very high, 1 million output tokens costs $60, and the price has returned to the gpt-3 era overnight

hidden resoning tokens are also counted in the output tokens. you can’t see them, but you have to pay for them.

for most tasks, it is best to use gpt-4o first, and then switch to o1 if it is not enough to save costs.

code tasks still prioritize using claude-3.5-sonnet

in short, the developer community still has many questions surrounding openai's new model o1.

o1 has opened up a new paradigm for high-level reasoning in ai, but it is not perfect in itself, and how to maximize its value remains to be explored.

against this backdrop, the “ask and answer” event held by openai received hundreds of questions within four hours.

attached below is a selection and summary of the entire event.

openai employees "answer all questions"

first of all, regarding this suddenly released new model, many people are curious why openai gave it a name like o1?

this is because in openai's view, o1 represents a new level of ai capabilities, so the "counter" was reset, and o represents openai.

just as ultraman said when o1 was released, o1, which can perform complex reasoning, is the beginning of a new paradigm.

regarding the preview and mini versions, openai scientists also confirmed some of the netizens’ speculations:

preview is a temporary version.the official version will be launched in the future(in fact, the preview version is an early checkpoint of o1);the mini version is not guaranteed to be updated in the near future

it becomes even clearer when combined with this picture previously released by openai member kevin lu.

compared to preview, mini performs well on some tasks, especially code-related tasks, and can explore more chains of thought, but has relatively less world knowledge.

in this regard, openai scientist zhao shengjia explained thatmini is a highly specialized model that focuses on a few capabilities, so you can go deeper.

this also solves the mystery that ultraman had been playing on this issue before.

regarding how o1 works, openai scientist noam brown also made it clear that it is not a "system" consisting of a model + cot as some netizens believe, but aa model that has been trained to generate thought chains natively

however, the chain of thought in the reasoning process will be hidden, and the official has made it clear that there is no plan to show the relevant tokens to users.

the few pieces of information openai revealed about this are that the cot-related tokens are summary and are not guaranteed to fully match the reasoning process.

in addition to the reasoning model, you can also learn in this question-and-answer activity thato1 can handle longer texts than gpt-4o, and will continue to increase in the future

in terms of performance, in openai's internal tests,o1 shows the ability of philosophical reasoning, you can think about philosophical questions such as "what is life?"

the researchers also used o1 to create a github bot capable of pinging code to its owner for review.

of course, for some non-inference tasks, such asin creative writing, o1's performance is not significantly better than gpt-4o, and is sometimes even slightly inferior

in addition, based on some questions, openai said that it is currently researching or has plans to research some unreleased features that netizens are concerned about, but there is no clear launch time:

tool calls are not supported yet, but function calls and code interpreters are planned for the future

future api updates will add structured output, system prompt words, and prompt word caching functions

fine-tuning is also planned

api users will be able to set their own limits on inference time and token consumption

o1 has multimodal capabilities and aims at sota on datasets such as mmmu.

in terms of performance, openai is also working on reducing latency and the time required for inference.

finally, people, especially api users, are concerned about the price. after all, considering that the inference process is included in the output token, the pricing of o1 is still relatively high.

openai says“will follow the trend of price reduction every 1-2 years”, and bulk api pricing will also be available when usage limits become more relaxed.

plus users on the web/app are currently subject to a limit of 30 previews + 50 mini messages per week.

but the good news is that just this morning, due to people's enthusiasm for o1, many people quickly used up their credit limit.openai made an exception and reset the quota

over