the full text of shun xiangyang’s speech at the young scientists 50² forum: 10 thoughts on large models

2024-09-28

on september 28, the 4th "young scientists 50² forum" was held at southern university of science and technology. shen xiangyang, a foreign academician of the national academy of engineering, gave a keynote speech on "how should we think about large models in the era of general artificial intelligence" and gave his 10 thoughts on large models.

the following are the specific contents of his 10 thoughts:

1. computing power is the threshold: the computing power requirements of large models have been huge in the past 10 years. today we are going to make a large model of artificial intelligence, saying that stuckness hurts feelings and lack of blocking means no feelings.

2. data about data: if gpt-5 comes out, the data volume may reach 200t. but there is not so much good data on the internet. after cleaning, 20t may be almost the maximum. therefore, in the future to build gpt-5, in addition to the existing data, more multi-modal data and even artificially synthesized data will be needed. .

3. the next chapter of the big model: there is a lot of multi-modal scientific research work to be done. i believe that a very important direction is the unification of multi-modal understanding and generation.

4. paradigm shift of artificial intelligence: after o1 came out, the original gpt pre-training idea changed to today's independent learning path, which is a process of strengthening learning in the inference step and continuous self-learning. the whole process is very similar to how humans think and analyze problems, and it also requires a lot of computing power.

5. large models sweep across thousands of industries: in the wave of large model construction in china, more and more large models are used in industries. this trend is definitely like this, and the proportion of general large models will become smaller and smaller in the future.

6. ai agent, from vision to implementation: the super application is there from the beginning. this super application is a super assistant and a super agent.

7. open source vs closed source: i think meta’s llama is not traditional open source. it just opens up a model and does not give you the original code and data. therefore, when we use open source systems, we must also make up our minds to truly understand the big picture. model system closed source work.

8. pay attention to the governance of ai: artificial intelligence has a huge impact on thousands of industries and the entire society, and everyone must face it together.

9. rethink the human-computer relationship: only by truly understanding human-computer interaction can we become a truly commercially valuable leader for each generation of high-tech enterprises. speak nowOpenAIadding microsoft means that this era is still too early. they are ahead, but there is still a lot of room for imagination in the future.

10. the nature of intelligence: although large models have shocked everyone, we have no theory about large models and deep learning. aboutaithe emergence of the problem was only talked about but not explained clearly.

"young scientists 50² forum" is the annual academic meeting of the new cornerstone science foundation, organized bysouthern university of science and technology, tencent sustainable social value division, and new cornerstone science foundation jointly organized. the new cornerstone science foundation was established and independently operated by tencent with an investment of 10 billion yuan over 10 years. it is currently one of the largest public welfare science foundations in china. its establishment and operation are tencent's long-term investment in technology for good. concrete actions for science funding.

"young scientists 50² forum" is an interdisciplinary academic exchange platform for "scientific exploration award" winners. the "scientific exploration award" was established in 2018. it is a public welfare award funded by the new cornerstone science foundation and led by scientists. it is currently one of the largest funding projects for young scientific and technological talents in china. each winner will share his or her big idea and latest exploration on the forum at least once during the five-year period of funding. "50²" means that the 50 young scientists selected by the "scientific exploration award" every year will have a significant impact on scientific and technological breakthroughs in the next 50 years.

the following areshen xiangyangthe full text of the speech at this forum:

i am very happy to have the opportunity to share with you some recent learning and experience in artificial intelligence in shenzhen today.

i'll continueyao qizhithe topic of artificial intelligence that mr. sir talked about, let me tell you about some of the things we are doing now in the era of large models, especially looking at this issue from the perspective of technology integration and industrial transition.

in fact, it is not just the importance of technological development in the era of artificial intelligence. the entire history of human development is a history of technological development. without technology, there will be no gdp growth. we won’t look back at things like drilling wood to make fire or inventing the wheel – we will just look at the many remarkable breakthroughs in physics in the past 100 years, and the breakthroughs in artificial intelligence and computer science in the past 70 years. we can see that there have been many developments. opportunity.

the topic we are talking about today is artificial intelligence and large models. in the past few years, everyone must have been shocked by the new artificial intelligence experience step by step. even if i have been working in artificial intelligence all my life, it would be difficult to imagine the situation today a few years ago.

i want to talk about three examples: the first is to generate text from text, the second is to generate pictures from text, and the third is to generate video from text. just now we talked about an artificial intelligence system like chatgpt, which is not only available internationally but also domestically. for example, before i came here to give a speech today, i asked chatgpt that i was going to participate in tencent’s young scientists 50² forum and give a speech. what kind of topics should i talk about given my background? you may think it's a bit funny, but actually, after using it, you think it's very good.

everyone is familiar with chatgpt. two years ago, openai launched a system that generates graphs. you give a paragraph and it generates a graph. seven months ago, it released sora. you give it a message and it generates a 60-second high-definition video for you, such as this video of walking on the streets of tokyo. it is very shocking. (i won’t show the video due to time constraints.)

let me give you an example of this vincentian diagram. i major in computer graphics, and i think i have a good sense of whether a photo is good or bad. two years ago, this photo came out. it was the first artificial intelligence-generated photo in human history, and it appeared on the cover of an american fashion magazine ("cosmopolitan"). a digital artist in san francisco used the openai system and asked a question, which resulted in this result. this passage is: in the vast starry sky, a female astronaut strutted on mars and walked toward a wide-angle lens. i don’t have that much artistic talent, but i was very shocked when i saw this picture. i think you will agree with me. when artificial intelligence draws such a picture, it really looks like a female astronaut. member. so this artificial intelligence has reached a very intelligent level.

today we have such amazing technologies and even such amazing products. we are also working very hard at home and building large-scale models. from technology to models to subsequent applications, we are working on all aspects. just now, academician yao also talked about a lot of the latest work of tsinghua university. so i want to share with you how we should think about large models in the era of general artificial intelligence. i would like to share some of my own opinions.

the first thought is that computing power is the threshold.

the most important thing about today's general artificial intelligence, large models, and deep learning is the overall growth of artificial intelligence computing power in recent years.

in the past 10 years, the computing power used by large models has increased, initially by six to seven times per year, and later by more than four times per year. let me ask you a question now, if something increases four times a year, how many times will it increase in 10 years? you think about it first, i'll come back to this issue later.

everyone knows that the company that benefits most from this wave of artificial intelligence development isnvidia, nvidia's shipments are increasing year by year, its computing power is gradually increasing, and the entire company's market value has become one of the three companies in the world (microsoft, apple, nvidia) with a market value of us$3 trillion. the most important thing is because of everyone’s demand for computing power every year. the number of nvidia chips purchased in 2024 is still growing rapidly. for example, elon musk is building a cluster of 100,000 h100 cards. it is very difficult to build a 10,000-card system. it is even more difficult to build a 100,000-card system. network requirements are very high.

today we are talking about computing power and large models. the most important thing is scaling laws (computing power and data). the more computing power, the more intelligence grows. now everyone has not reached the ceiling. unfortunately, when the entire amount of data increases, the growth of computing power is not a linear growth, but the growth of computing power is more like a square growth.

because when the model becomes larger, the amount of data needs to be increased to train the model, so relatively speaking, it is more like a square growth. therefore, the requirements for computing power have been huge in the past 10 years. so i’ll just say one thing: today i’m going to make a large model of artificial intelligence. if it’s stuck, it hurts your feelings. if it’s not stuck, you won’t have feelings.

i just asked you a question, if it rises 4 times every year, how many times will it rise in 10 years? those of us who study computers all know that there is something called "moore's law", which means that computing power doubles every 18 months or so. this is how intel has developed over the years. why has nvidia now surpassed intel? a very important reason is that its growth rate is different. if it doubles in 18 months, it will probably increase by 100 times in 10 years, which is also very remarkable; if it increases by 4 times every year, it will be 1 million times in 10 years. this growth is very amazing. if you think about it this way, it's understandable that nvidia's market value has risen so fast in the past 10 years.

the second thought is about data.

computing power, algorithms and data are three important factors for artificial intelligence. i mentioned earlier that we need a lot of data to train general artificial intelligence. when chatgpt3 came out, it was still at the stage of publishing papers, and it was said that it needed 2 trillion token data; by the time gpt-4 came out, it would be about 12t; gpt-4 is constantly training, and today it is estimated that it has quantity exceeding 20t. anyone who cares about artificial intelligence knows that everyone has been waiting for gpt5 to come out for such a long time, but it has not come out. if gpt-5 comes out, my personal judgment may be that the amount of data will reach 200t. looking back, there is not that much good data on the internet. after you clean it, 20t may be almost the top. so in the future, if you want to build gpt-5, in addition to the existing data, you will need more multi-modal data. , even artificially synthesized data.

a very interesting thing is that in the past thirty or forty years, everyone has shared their information online. in the past, we thought we were working for search engines. what is even more remarkable now is that our accumulation over the past thirty or forty years is for a moment like chatgpt, it integrated everything and learned such an artificial intelligence model through powerful computing power. this is what happened.

third thought, the next chapter of the big model.

you’ve worked until today, what should you do next? the first is the language model. represented by chatgpt, its underlying technology is natural language processing. what everyone is working on today is a multi-modal model, represented by gpt-4, and many of the technologies in it are computer vision. moving forward, we need to develop embodied intelligence. what is the purpose of embodied intelligence? in fact, we need to build a world model. even if it is multi-modal, there is no underlying physical model, so we need to build such a world model. the world model means that you not only have to read thousands of books, but also travel thousands of miles to feed more knowledge in the world back to your brain. so we should make robots. i think shenzhen should make up its mind to build robots and embodied intelligence. there is a special track in robots called autonomous driving. autonomous driving is a special robot, but it drives on a given route.

what to do? there is a lot of multimodal scientific research work to be done, and i believe that a very important direction is the unification of multimodal understanding and generation. even if sora is built, it will be separate. the generation of multi-modules and the understanding of multi-modules are not unified. there is a lot of scientific research work we can do in this area.

to give an example, several of my students made a large model company, step stars, and their multimodal understanding was very good. if you show a picture to an artificial intelligence, why the behavior in the picture is called "invalid skill", the ai will explain to you that the picture looks like a child rolling on the ground, but his mother is indifferent, and she is look at mobile phones and drink drinks, so this skill of children is called an invalid skill. ai is now getting better and better at understanding graphs.

the fourth thought is the paradigm shift of artificial intelligence.

two weeks ago, openai released its latest model, o1. i mentioned earlier that gpt has been developing. after gpt4, gpt5 has not been released. everyone is thinking, if it is just an increase in the parameters of large models, has it reached its peak? no one knows, it has not been released yet, and we have not made a larger model in china.

but now a new dimension has emerged, which is not to do the previous pre-training (expansion), but to do the expansion when doing inference. it has changed from the original gpt idea to today's independent learning path, which is a process of strengthening learning in the reasoning step and continuous self-learning.

in the past, we did pre-training, basically predicting what the next word will be and what the next token will be. now the new idea is to make a draft and try to see if this path is right and that path is right, just like the human brain. in thinking, there is a fast system and a slow system. just like when we do math problems, we make a draft first to see which way will work. there is a chain of thinking, and then we look at the opportunities in the process of optimizing the chain of thinking. so far, only openai has released such a system, and i encourage everyone to take a look at some examples here.

the most important thing is that the whole process is very similar to how humans think and analyze problems, draft, verify, correct errors, and start over. this space for thinking will be very large. doing this also requires a lot of computing power.

the fifth thought is that large models are sweeping across thousands of industries.

all companies have to face the opportunities brought by large models, but every company does not need to make a universal large model. if you don’t even have 10,000 cards, there is no chance of making a universal large model. to do a universal large model the model must have at least 10,000 cards.

for example, when gpt4 came out, its total training volume was 2×10^25 flops. with such a large amount of training, it will take a year to run 10,000 a100 cards to reach this amount. if this amount cannot be achieved, there will be no real universal large model. with the general large model, we can build our own industry large models on this basis, such as finance and insurance. maybe kilocalories can do very well, and we can make some fine-tuning on it. for an enterprise, you have your own data, including internal data and customer data. if you take out these data and use dozens or hundreds of cards, you can make a very good model for your enterprise. so it is built up layer by layer.

of course, there is another very important dimension that i like very much, which is the personal model of the future. today, we have gradually used pcs and mobile phones (data has accumulated to a certain extent), and understand us more and more. in the future, i believe that there will be such a super-intelligent ai to help you. after collecting relevant data, it can build a your own personal model. this is in the (personal) terminal part, and mobile phones are a natural thing. in terms of pc, pc companies such as microsoft and lenovo are also promoting the concept of ai pc, so there are also such opportunities.

in the wave of large model construction in china, more and more industry large models are included. here is an example. because china's large models need to be approved by the cyberspace administration of china before they go online, by the end of july this year, a total of 197 models in china had been approved by the cyberspace administration of china, of which 70% were industry large models and 30% were general large models. . this trend is definitely like this, and the proportion of general large models will become smaller and smaller in the future. for example, we can make a financial model on a general large model. this is a large model made by a company in shanghai for its financial customers. for example, when nvidia's financial report comes out, you can immediately summarize its highlights and problems.

the sixth thought is ai agent, from vision to implementation.

today we see what are the biggest super applications of large models and where are the biggest opportunities. many people are still trying to find a super app. in fact, the super application has been there from the beginning. this super application is a super assistant, a super agent.

gates and i worked together at microsoft for many years, and we were both thinking about this issue. what's so difficult about it? the difficulty lies in understanding a workflow when you actually want to do useful work. if you ask a question, it can be broken down step by step. what i can do today is something that has a certain degree of influence, such as being a customer service or personal assistant. but many jobs cannot be done. why can’t it be done? you need to be a digital brain. the large model below is just the first step. the ability of the large model is not powerful enough to help you do all the above work step by step. because you really want to make such an agent, so that it can do things, it needs to understand what the following problems are, and each part has corresponding skills.

we have already done many good examples using today's model. for example, you can be an ai health consultant, talk about your understanding of cosmetics, and recommend cosmetics. next, you will see many applications in this area.

the seventh thought is open source and closed source.

in the development of world science and technology in the past few decades, especially the development of china's science and technology, two things are very important.

the first is the emergence of the internet. with the internet, you can find all papers and materials online.

the second is open source. open source allows you to sharply shorten the gap with leaders when making applications. but open source is not the same as open source for large models and databases, although the capabilities of open source are now close to those of closed source. there are also many companies in china that are doing open source stuff. the best open source tool today is meta’s llama 3.1, which claims to be close to openai. i don’t think so. i think it is not traditional open source. it just opens up a model and does not give you the original code and data. therefore, when we use open source systems, we must also make up our minds to truly understand the system closure of large models. source work.

the eighth thought is to pay attention to ai governance.

because ai is developing so rapidly, the world attaches great importance to ai safety. because the impact of this matter is so great. artificial intelligence has a huge impact on thousands of industries and the entire society. the development of the entire world actually requires everyone to face it together.

the ninth thought is to rethink the human-machine relationship.

i just introduced vincent's text, vincent's pictures, and vincent's video - how much of it is the intelligence of the machine, and how much of it is the shock that human-computer interaction brings to us?

about 10 years ago, "new york times" columnist john markoff wrote a book that i like very much, "machine of loving grace", which summarized the two lines of past development of technology: one is artificial intelligence; the other is ia ( intelligent augmentation), it is the enhancement of intelligence, which is human-computer interaction. after computers became available, they helped people do many things, and playing chess is one example.

in fact, only by truly understanding human-computer interaction can we become leaders with real commercial value for each generation of high-tech enterprises. today’s artificial intelligence interface has become very clear, which is the process of dialogue. today’s representative is chatgpt. but talking about openai plus microsoft means that this era is still too early. they are ahead, but there is still a lot of room for imagination in the future.

the tenth thought is the nature of intelligence.

although large models have shocked everyone today, we have no theory about large models and deep learning. today, we would love to have any theory that feels good. unlike in physics, from a physical point of view, everything from the vast starry sky to the smallest quantum are described by some beautiful physical laws. today there is no such theory for artificial intelligence, which has no explainability or robustness. today’s deep learning framework cannot reach true general artificial intelligence.

regarding the emergence of artificial intelligence, everyone only talked about it but did not explain it clearly. why does intelligence emerge when the model is large enough? why can the 70b model emerge with intelligence? there is no such truth. so we are also working very hard on this issue. last summer i also organized a seminar at the hong kong university of science and technology with the theme of "mathematical theory for emergent intelligence". when discussing emergent intelligence, some scientific and mathematical principles need to be explained clearly and more people who are willing to explore need to participate. come in, especially with the emergence of tencent’s “science exploration award” and “new cornerstone researcher” projects, more young scientists have joined in, and they have more confidence and belief to go deep into the difficult task of making breakthroughs for the future development of artificial intelligence. among the questions.

congratulations again to all the winners and young scientists. the development of science and technology needs to be done by young people from generation to generation, especially artificial intelligence. thank you all again.

news

the full text of shun xiangyang’s speech at the young scientists 50² forum: 10 thoughts on large models

introduction

my contact information