news

openai once again gives life to the big model "bubble"

2024-09-14

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

text丨he qianming

editor: cheng manqi and huang junjie

openai's most important product this year, the o1 model, was released as scheduled. the stock price of nvidia, the biggest ai beneficiary, rose by 10% in two days.

o1, which consumes more computing power to answer questions, will "think" for tens of seconds or even longer before giving a reply. openai said that it far outperformed the existing large models on the market when answering mathematical olympiad questions or completing programming tasks.

but openai ceo sam altman's good mood was soon interrupted. under his tweet announcing the full launch of o1, the top comment was: "when can i use the new voice function??" he immediately countered: "can you spend a few weeks thanking this magical intelligence before asking for new toys?"

what this user asked altman for was not some new toy, but the gpt-4o end-to-end voice feature that openai promised in may this year. in the live demonstration at that time, this new ai voice was natural, responsive, and knew when to interrupt, making it difficult for others to distinguish the real from the fake. according to the official schedule, tens of millions of chatgpt paying users were supposed to use this feature within a few weeks, but it has been delayed until now.

over the past year, openai's products have all been similar "futures": gpt-4 has been online for more than a year, and openai's next-generation model gpt-5 still has no signs of being released. the video model sora released by openai earlier this year has not been made available to the public on a large scale, and only a few selected industry professionals have actually used it so far.

the industry leader's missed deadlines have worn down the capital market's patience with ai big models. some chinese technology giants and big model companies suspended training basic models in the middle of this year, investing more resources in application development or renting gpu computing power to external customers. they are worried that there is not much room for technological advancement, so they began to reduce investment and strive for returns.

before this week, nvidia's market value fell by more than 20% from its peak in june, and microsoft's market value also shrunk by 13%, with each company losing hundreds of billions of dollars. microsoft's cfo said that they would have to wait 15 years or more to get back the tens of billions of dollars they invested in the large model field.

sequoia's research shows that last year, investment in ai exceeded revenue by more than $120 billion, and this year it may expand to $500 billion. but apart from nvidia, few companies have seen a large proportion of revenue growth. more and more people in the industry are beginning to discuss whether the ai ​​bubble will burst again if the capabilities of large models only go so far.

"bubble" is not necessarily a bad thing. before new technologies change the world, there will always be a stage where the vision far exceeds the reality. the difference lies in whether the vision can be fulfilled and when it can be fulfilled. if it cannot be fulfilled for a long time, the bubble will burst and the company will go bankrupt. a serious bubble burst can even bring down a field or even multiple economies. if the vision is fulfilled, everything is just a footnote to technological progress.

the o1 released by openai will at least temporarily reverse the doubts that there is no room for improvement in large models, and extend the life of the large model "bubble".

any new technology needs to keep improving in order to change the world. the uniqueness of o1 is not only that the performance in programming, mathematics, physics and other fields has been greatly improved, but also that it has found a way forward for a group of openai followers and the investors behind them: in the past, computing power was more used for "memorizing knowledge" - training models with large amounts of data, while o1 allocated more computing power to "thinking when answering questions", that is, the reasoning process, which greatly improved logical ability.

prior to this, large model training had fallen into the bottleneck of the original scaling laws. as the scale of model parameters expanded, performance improvement gradually slowed down.

the o1-mini, which is specially optimized for mathematics, programming, and scientific problems, has also demonstrated considerable application potential. it can not only directly help scientists and developers improve their work efficiency, but also indicate how to develop models with better performance and security in other high-value vertical fields.

like its previous releases, openai carefully selected the timing of releasing o1. according to media reports, before the release of o1, openai was seeking $7 billion in new financing at a valuation of $150 billion, with potential investors including apple, nvidia, microsoft, and the uae investment fund. now, this resource investment race has one more reason to continue.

from large language models to "inference models", o1 is strong in science but weak in literature

this time, openai released two models for users: o1-preview and o1-mini, and announced that more o1 series models are on the way.

the reason why the series is named o1 instead of gpt is that the training methods of the two are significantly different. in the blog post introducing o1, openai repeatedly called it a reasoning model instead of the large language model that was commonly used when calling gpt.

the overall training logic of traditional large language models such as gpt is pre-training plus fine-tuning: first, in the pre-training stage, massive data is used to teach the large model to predict the next word, and then in the fine-tuning stage, the large model is allowed to learn knowledge in specific fields, and humans are asked to write answers to teach the large model what answers people want, and the machine improves based on feedback.

the new keywords of o1 are reinforcement learning and chain of thought (cot).

openai describes the process like this: “through reinforcement learning, o1 learns how to hone its thought chain and refine the strategies it uses. it learns to recognize and correct its mistakes; break down complex steps into simpler ones; and try different approaches when the current one doesn’t work. this greatly improves the model’s reasoning ability.”

alphago, which defeated the world champion of go in 2016, and its follow-up model alphazero used reinforcement learning to let the model play against itself and learn what strategies can improve the winning rate in different states (games). the model can also generate data through continuous attempts to train the model.

openai o1 uses a similar approach: providing a large model with data for solving problems step by step, allowing the model to learn to correct itself and learn how to answer questions at each step to get the best answer.

"the capabilities of ai models still come from data. if there is something missing or insufficient in the data, the model cannot learn. o1 is equivalent to implementing a method to synthesize professional data. with these professional data, o1 can learn professional capabilities." said yuan jinhui, founder of silicon mobility.

when answering questions, o1 will also break the questions down into a series of small questions and answer them step by step. this is the "thinking chain". according to openai, the process of o1 answering questions is as follows (openai hides the content of the thinking chain in the version used by users):

question (original in english):

write a bash script that takes a matrix (represented as a string in the format '[1,2],[3,4],[5,6]') and prints the transpose of the matrix in the same format.

o1-preview's solution (thinking chain):

a user requested a bash script that would take a string representing a matrix, such as '[1,2],[3,4],[5,6]', and output its transpose in the same format.

let's first understand the format of input and output.

input: '[1,2],[3,4],[5,6]', its corresponding matrix is:

1 2

3 4

5 6

therefore, the input is a series of lines surrounded by square brackets and separated by commas.

the transpose of this matrix is:

1 3 5

2 4 6

therefore, the required output is '[1,3,5],[2,4,6]'.

overall mission:

write a bash script that takes one argument (a string representing a matrix) and outputs its transpose in the same format.

ok, now let's think about how to parse an input string in bash, build a matrix (probably using an array), transpose it, and then output the transposed matrix, keeping the same format.

constraints:

-since this is bash, we have to be careful with string parsing.

-the script should accept an input string as an argument.

method:

- parse the input string to extract matrix elements.

- build the matrix as an array of arrays.

-transpose the matrix.

- output the transposed matrix in the same format.

implementation steps:

1. capture the input string as a parameter.

2. remove all spaces (if any).

3. parse the input string and extract the numbers into an array.

-since the input format is '[1,2],[3,4],[5,6]', we can:

-- remove outer brackets if necessary.

-- split the string by '],' to get each line.

--for each line, remove '[' and ']', then get the elements by splitting with ','.

4. build a two-dimensional array (array of arrays) in bash.

-since bash arrays are one-dimensional, we need to simulate two-dimensional arrays.

5. transpose the matrix.

-when doing a transpose, we need to swap the rows with the columns.

6. output the transposed matrix in the desired format.

-reconstruct the output string to the format '[1,3,5],[2,4,6]'.

now, let's start writing the code step by step.

potential issues:

-bash itself does not have native multidimensional arrays, so we need to simulate it.

-we need to handle numbers that may be multi-digit.

next, we can start writing the skeleton of this bash script step by step.

【slightly】

now let's test this script using sample input '[1,2],[3,4],[5,6]'.

let me try to simulate what would happen.

i'll write this down as an answer.

neither reinforcement learning nor thought chain were the technical directions first proposed by openai.

before o1 was launched, google deepmind released an updated version of the alphageometry model trained using a similar method in july, which won a silver medal in the international mathematical olympiad (imo).

but openai once again proved that they can be the first to come up with a model with more amazing results based on the industry. according to openai, their o1-ioi model trained specifically for the international olympiad in informatics won the gold medal.

with new training and reasoning methods, openai o1-preview significantly surpasses gpt-4o in solving difficult math problems and handling programming tasks. for example, on the competition math dataset aime 2024, o1-preview scored more than 4 times that of gpt-4o; on the programming competition dataset codeforces, the gap became 5.6 times.

in mathematics competitions, programming competitions, and scientific question-answering data set tests, o1-preview and o1 significantly surpassed gpt-4o.

openai said that the openai o1-preview and openai o1-mini, which are currently available in limited quantities, are only early versions. their next model will perform like a doctoral student in challenging test questions in physics, chemistry, biology, etc., while previous models were mostly at the undergraduate or master's level.

openai mentioned 212 contributors to turn technology into reality, which is equivalent to the number of contributors to the text part of gpt-4o (234). however, more types of data are needed to train o1. openai mentioned that the data used for o1 pre-training came from public datasets, proprietary data from partners, and custom datasets developed internally, while gpt-4o only used the first two types of data when mastering text capabilities.

the o1 series, which tends to be a "reasoning model", has not surpassed gpt-4o in all aspects, and language ability is its relative weakness.

in openai's test, most people think that o1-preview is better than gpt-4o in answering questions that require reasoning, such as data analysis, programming, and mathematics, while gpt-4o is still better in personal writing, text editing, etc.

o1-preview will also encounter problems that gpt-4o cannot solve, for example, it will also "talk nonsense" and think that 9.11 is bigger than 9.2.

gpt-4o is better in terms of personal writing, text editing, etc.

the longer response time (thinking) brought by thought chaining may be the shortcoming of the o1 series in actual use. when asked to "list the names of five countries whose third letter is a", gpt-4o only took 3 seconds, while o1-mini took 9 seconds and o1-preview took 32 seconds, which is ten times that of gpt-4o. it is basically unusable for simple questions.

o1-preview and mini do not have the functions of browsing the web, obtaining information, and processing uploaded files and images like gpt-4o. at present, it seems that software developers can use o1 to improve productivity the fastest, but openai also limits the way they call the api: only 20 calls per minute, excluding function calls, streaming, system message support and other functions.

from training scaling to inference scaling, the computing power competition will continue

in the view of many large model researchers, the most important change of o1 is that it shows a new path to greatly improve the capabilities of large models.

the original scaling laws meant that using more data and computing power to train a model with larger parameters would result in better performance.

as shown in the figure below, o1 shows that if the model spends more time and more computing power to answer questions (test-time compute), the performance will continue to improve. nvidia senior ai scientist jim fan said on social media that this may be the most important picture in large model research since deepmind proposed chinchill scaling laws (an optimization on the original scaling laws) in 2022.

jim fan also proposed a possible future evolution of large models: future models may separate reasoning from knowledge, have a small "reasoning core", and also use a large number of parameters to memorize facts (knowledge) in order to perform well in tests such as trivia questions and answers.

openai also specifically mentioned in the article introducing o1 that they will continue to develop the gpt series of models. this may indicate that openai will introduce the methods used in o1 into the next generation of gpt models.

whether o1 becomes the new mainstream method, or o1 is combined with the gpt series to evolve into the next generation of large models, the computing power requirements should be further increased.

openai has not disclosed the inference cost of the o1 series, but from the time it takes the o1 model to answer questions and openai's restrictions on the use of o1, it can be inferred that o1 requires much more massive inference computing resources than the gpt series.

chatgpt plus users who pay $20 per month can currently only use o1-preview 30 times and o1-mini 50 times per week. the current limit for gpt-4o is 4480 times per week (80 times every 3 hours), which is 90 times that of o1-mini and 150 times that of o1-preview.

o1-preview's answering time has increased from seconds for gpt series models to tens of seconds or even longer. the amount of text it processes when answering questions has also increased significantly. taking the programming problems listed in the article to show the "thinking chain" process as an example, when o1 answered, including the thinking chain, it processed 6632 characters, which is 4.2 times that of gpt-4o. longer computing time and longer output both mean higher computing power costs.

o1 the stimulation of ai prospects and computing power consumption was quickly reflected in the capital market. since the beginning of this week, when media reported that openai was about to release a new model, nvidia's stock price has rebounded by 10%, and microsoft has also risen.

for those companies that are uncertain about the direction of technological evolution or have slowed down their research on basic models, there is now new work to do and new directions to go. the release of o1 most likely means that the competitive gap in "inference models" has widened again, and a round of accelerated catch-up and investment is about to take place.

"it's time to get down to business, or we really won't be in the game anymore," said a chinese large-scale model researcher.

title image: visual china

in 1957, a man-made object entered the universe for the first time and flew around the earth for three weeks. humans could look up and see a small flash of light streaking across the sky at night, parallel to the stars in mythology.

such feats sparked joy across the globe, across race and ideology. but it was not the kind of triumphant joy we might expect from a human feat. rather, as the political philosopher hannah arendt observed at the time, the mood was closer to a long-awaited relief that science had finally caught up with expectations and that “mankind had finally taken the first step towards escaping the prison of the earth.”

people always quickly adjust their expectations of the world based on technological exploration. when a science fiction writer's fantasy becomes reality, it is often because technology has finally caught up with people's expectations, or in arendt's words, "technology has realized and confirmed that people's dreams are neither crazy nor empty."

at times like today, it's better to dream a little more.

this is also the expectation of latepost when it launches the tech tuesday column. we hope to regularly introduce new scientific research and technological progress in addition to the business world that latepost pays attention to on a daily basis.

these may be about the progress of a cutting-edge research, observations on a technological application, or tributes to some outstanding technology or even an era.

this column will record the various changes in the world from the perspective of science and technology. during this journey, we hope that readers can join us in gaining a better understanding of the world.

· FIN ·