news

HKU’s Ma Yi: Large models without theories for a long time are like blind men touching an elephant; big names gather to discuss the next step of AI

2024-07-24

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

The west wind blows from Aofei Temple
Quantum Bit | Public Account QbitAI

"I want to ask everyone here a question. Whether you are a student of Qiuzhen College or Qiu Chengtong Junior Class, if you don't know the answer to this question, then you shouldn't be in this class!"

At the 2024 International Basic Science Conference "Basic Science and Artificial Intelligence Forum", Lenovo Group CTO and foreign academician of the European Academy of SciencesRui YongAs soon as these words were spoken, everyone in the audience became a little nervous.

But then he asked the following question:Which one is bigger, 13.11 or 13.8?



Good boy, just ask who doesn’t know this joke.

However, this time they are not mocking the model for its dementia. Several AI experts from the academic and industry circles analyzed a series of problems such as the model's "hallucinations" and brought up their views on "What’s next for artificial intelligence?"the opinion of.



In summary, the following viewpoints are included:

  • The next step in the development of big models is to move beyond the search paradigm of “no abstract ability, no subjective value, and no emotional knowledge.”
  • Commercial applications lag behind the growth of the model itself, and there is a lack of a super product that can truly reflect the value of the investment.
  • Under the limitation of hallucination, the next step is to think about how to further expand the generalization and interactivity of the model. Multimodality is an option.
  • It is a very important issue to enable intelligent agents to know the limits of their capabilities.

Dean of the School of Data Science and Head of the Department of Computer Science at the University of Hong KongMa YiDuring the discussion, even the current mainstream "artificial intelligence" was questioned:

The development of artificial intelligence technology has accumulated a lot of experience, some of which we can explain, and some of which we cannot. Now is the time when theory is very needed. In fact, our scholarships in the past ten years can be said to have not made many breakthroughs. It is very likely that the rapid development of industry and engineering technology has affected the pace of academic research itself.



Let’s take a look at what the big guys said in detail.

What is the essence of intelligence?

On the scene, Dean of the School of Data Science and Head of the Department of Computer Science at the University of Hong KongMa Yi, and delivered a keynote speech on the topic of "Returning to the theoretical cornerstone and exploring the essence of intelligence."

The views expressed therein coincide with the issues discussed at the roundtable.

The theme of Professor Ma Yi’s speech was “Return to the theoretical cornerstone and explore the essence of intelligence”. He reviewed the historical development process of AI and put forward his own views on the current development of AI.



He first talked about the evolution of life and intelligence.

In his personal opinion, life is the carrier of intelligence, and the ability of life to be produced and evolve is the result of the action of intelligent mechanisms.The world is not random, it is predictable, life learns more predictable knowledge about the world in the process of continuous evolution.

Survival of the fittest is a kind of feedback of intelligence, similar to the current concept of reinforcement learning.



From plants to animals, reptiles, birds, and humans, life has been improving its intelligence. However, there is a phenomenon that the more intelligent a life is, the longer it stays with its parents after it is born. Why?

Professor Ma Yi further explained: Because genes are not enough, some abilities need to be learned. The stronger the learning ability, the more things need to be learned, which is a higher form of intelligent body.

If we learn individually, it is still not fast enough or good enough, so humans invented language, and human intelligence became a form of collective intelligence.

The emergence of collective intelligence is a qualitative change. We not only learn these predictable phenomena from empirical observations, but alsoAbstract logical thinking, we call it human intelligence, or later called artificial intelligence.



Next, he talked about the origins of machine intelligence.

Since the 1940s, humans have begun to try to make machines simulate the intelligence of biological organisms, especially animals.

Humans began to model neurons and explore "what brain perception is". Later, everyone discovered that simulating the animal nervous system should be built from artificial neural networks as a whole, and the research became more and more complicated.



This matter was not all smooth sailing. We went through two cold winters during which people discovered some limitations of neural networks. Some people are still persisting in solving these challenges.

Later, with the development of data computing power, training neural networks became possible, and deeper and deeper networks began to develop, with better and better performance.



But there is one biggest problem:These networks are designed empirically. They are black boxes, and the box is getting bigger and bigger. People don’t know what is going on inside.

What's wrong with black boxes? From a technical perspective, empirical design is also possible, and you can continue to try and fail. But it is very costly, takes a long time, and the results are difficult to control. In addition:

As long as there are phenomena in this world that cannot be explained but are very important, and many people are kept in the dark, panic will be created. This is happening now.

So, how to open the black box? Professor Ma Yi proposed to return to the original question: Why do we need to learn? Why can life evolve?

He particularly emphasized that we must talk about things that can be achieved through computing:

Don’t talk about anything abstract. This is my advice to everyone. You must talk about how to calculate and how to execute this thing.



So what should we learn?

Professor Ma Yi believes that we should learn things with predictable regularities.

For example, if you hold a pen in your hand and let go, everyone knows what will happen, and if you move quickly, you can catch it. This was known before Newton. It seems that humans and animals have done a good job of modeling the external world.



In mathematics,Predictable information is uniformly reflected in the low-dimensional structure of data in high-dimensional space

So what is the unified calculation mechanism? Professor Ma Yi gave the answer:Like attracts like, unlike repel each other, it’s that simple in essence.

How to measure whether it is good or not? Why do we need to compress?

He gave an example, as shown below. For example, the world is random, nothing is known, and everything can happen. If we use blue balls as a substitute, all blue balls are possible in the next second.

But if you want to remember that one of these things happened, you have to encode the entire space and give it a code. Only the area with the green ball is likely to happen, and the area with the blue ball will be much less likely.

The smaller the area in which we know what will happen, the less we know about the world. Information theory in the 1940s was establishing this.

In order to find these green areas better, we need to organize them better in the brain. So our brain is organizing this phenomenon, this low-dimensional structure.



How can this be achieved computationally?

Professor Ma Yi said that all deep networks are actually doing this. For example, Transformer is doing this by segmenting and classifying images.



In fact, each layer of the neural network isCompressing Data

Mathematics plays a very important role in this. You strictly measure what to optimize and strictly talk about how to optimize. When you have done these two things, you will find that the operator you get is very similar to many operators found by current experience.
Whether it is Transformer, ResNet, or CNN, they all do the same thing in different ways. And what they are doing can be fully explained in statistics and geometry.



butThe optimal solution of optimization itself may not be the correct solution, important information may be lost in the compression process, how to prove that the existing information dimension is good? How to prove that there will be no hallucinations?

Back to the basics of learning, why do we need to remember these things?The brain simulates the physical world.In order to better carry outpredict

Later, Ma Yi mentioned the concept of alignment:

So alignment is not about aligning with people, alignment is thisThe model aligns itself with what it has learned.



It is not enough to learn an autoencoding from both the inside and the outside. How do animals in nature learn the physical model of the external world?

Constantly use your own observations to predict the external world, as long as it is consistent with the observationsConsistent, that's it. This involves the concept of a closed loop.

As long as it is a living creature, as long as it is an intelligent creature, it is a closed loop.



Professor Ma Yi then pointed out that we are still far from true intelligence.

What is intelligence? People often confuse knowledge with intelligence. Does a system have intelligence if it has knowledge? An intelligent system must have the foundation to improve itself and increase its own knowledge.

Finally, Professor Ma Yi gave a summary.

Looking back at history, in the 1940s everyone wanted machines to imitate animals, but in the 1950s Turing proposed one thing - can machines think like humans? At the Dartmouth Conference in 1956, a group of people sat together, and their purpose was to doUnique intelligence that distinguishes humans from animalsAbstract ability, symbolic operation, logical reasoning, causal analysiswait.

This is what they defined artificial intelligence to do in 1956, and later these people basically won the Turing Award. So if you want to win the Turing Award in the future, do you choose to follow the crowd or do something unique...

Looking back at what we have been doing in the past 10 years?

At present, "artificial intelligence" is used in image recognition, image generation, text generation, compression and denoising, and reinforcement learning. Professor Ma Yi believes thatFrom a basic perspective, what we do is about animals., including predicting the next token and the next frame image.

It’s not that we didn’t have anyone working on it later, but it wasn’t a mainstream large model.



He further explained that if enough money and data are invested, many of the model's performance will continue to develop, but if there is no theory for a long time, problems will arise, just like a blind man touching an elephant.



Professor Ma Yi said that by sharing his personal experience, he hopes to give some inspiration to young people.

With the principle, we can design boldly, and no longer wait for the next generation to invent a network that seems good, and then we can use it together. So where is your opportunity?

Let’s take a look at how other AI experts in the roundtable forum answered the question “What’s the next step for artificial intelligence?”

What’s next for artificial intelligence?
The big model needs a paradigm shift

Fellow of the Royal Academy of Engineering, the Academy of Europe, the Hong Kong Academy of Engineering Sciences, and Chief Vice-President of the Hong Kong University of Science and TechnologyGuo YikeWe're at a very interesting moment right now.

Because the Scaling Law is widely accepted, the Hundred Model War has gradually become a resource war. It seems that there are only two things to do now. After the Transformer model is available, the solution isHigh computing powerandBig DataThe problem.

However, in his opinion, this is not the case. The current development of AI still faces many problems, one of which isLimited computing power and infinite demandThe problem.

In this case, how should we make a large model? Academician Guo shared his thoughts through some practices.

First, Academician Guo mentioned that under the limitation of computing power, the more economical MOE is adopted.Mixed Expert ModelIt can also achieve very good results.



In addition, how to continuously improve a model with new data after training it so that it can remember what it should remember and forget what it should forget, and also remember what it has forgotten when needed, is also a difficult problem.

Academician Guo disagreed with the industry's statement that "the data has been used up". "In fact, the model has just been compressed, and the compressed data can generate new data," that is, using a generative model to generate data.

Next, not all models need to be learned from scratch.Knowledge EmbeddingInto the base model. There is also a lot of work to be done in this area.

In addition to computing power, there is another problem with the algorithm:The cultivation of machine intelligence and human intelligence itself has polarity

Academician Guo believes that when training large models, the more important issue is not in the front, but in the back.

As shown in the figure below, the evolutionary path of the big model is from self-learning > indirect knowledge > values ​​> common sense, while the cultivation path of human education is the opposite.



Because of this, Academician Guo believes that we should move beyond today’s big model search paradigm of “no abstract ability, no subjective value, and no emotional knowledge.”

We all know that human language is great. It contains not only content and information, but also humanity and the energy of information. So how to classify these things into the model? This is an important direction for our future research.



In summary, Academician Guo believes that there are three stages of development for artificial intelligence:

The first stage is based on authenticity; the second stage is based on value. The machine must be able to express its own views and form its own subjective value, and this view can change according to its environment; thirdly, when it has values, it will understand what is novel, and only with novelty can it create.

When it comes to creating this model, the so-called illusion is not a problem, because illusion is only a problem in the paradigm model. Writing a novel must be an illusion. Without illusion, you can't write a novel. It only needs to maintain consistency and does not need authenticity, so it only needs to reflect a value. So in this sense, the development of the big model actually requires a paradigm change.
The development of large models lacks a "super product"

Vice President of JD.com Group, Adjunct Professor and Doctoral Supervisor at the University of WashingtonHe XiaodongI believe AI will face three problems in the next step.

First of all, he believes that in a sense, the development of large models has entered a plateau period.

Due to data and computing power limitations, if we simply increase the scale, we may reach the ceiling, and computing resources will become an increasingly heavy burden. If we follow the latest price war (price tag), it is likely that the economic benefits generated by the large model will not even cover the electricity bill, which is naturally unsustainable.

Secondly, Professor He believes that the entire commercial application lags behind the scale growth of the model itself, which will eventually become a problem in the medium and long term:

Especially when we see such a large scale, it is no longer a simple scientific problem, it will also become an engineering problem. For example, when the parameters reach trillions and the data called reaches 10 trillion tokens, then we must raise a question: the social value it brings.

Therefore, Professor He believes thatLack of a super app and super product, which can truly reflect the value of investment.

The third question is a relatively specific one, namelyLarge Model Illusion

If we want to build an AI industry building on top of the big model, we must have extremely high requirements for the basic big model illusion. If the basic big model has a high error rate, it is difficult to imagine that more commercial applications can be added to it.
Serious industrial applications require solving illusions.

Professor He believes that under the limitation of hallucination, the next step is to think about how to further expand the generalization and interactivity of the model.MultimodalityIt is an inevitable choice.



Large models lack awareness of “capability boundaries”

CTO of Lenovo Group, Foreign Member of the European Academy of SciencesRui YongFrom an industrial perspective, he gave his views on the next step of AI.

He said that from the perspective of the industry, what is more important is how the model is implemented. In terms of implementation, Dr. Rui Yong mainly talked about two points:

  • It is not enough to have a big model, you must developAgent
  • It is not enough to have a cloud testing model.Hybrid Framework



Specifically, Dr. Rui Yong first listed some studies and pointed out that the limitations of large models are becoming more and more obvious. For example, the question "Which is bigger, 13.8 or 13.11" mentioned at the beginning shows that the model does not really understand the problem.

In his opinion, the current large models actually just connect the massive amounts of fragmented information seen in the high-dimensional semantic space. It is not enough to rely solely on stacking large computing power and large networks to create generative large models. The next step should be to develop in the direction of intelligent agents.

Dr. Rui Yong particularly emphasized the large modelCapability Boundaryquestion.

Today's big model actually doesn't know where the boundaries of its own capabilities are.
Why does the big model have hallucinations and speak nonsense? In fact, it is not trying to deceive us, but it does not know what it knows and what it does not know. This is a very important issue, so I think the first step is to make the intelligent agent know the boundaries of its own capabilities.

In addition, Dr. Rui Yong said that it is not enough to just have intelligence for AI to be implemented. The public large models on the cloud need to be privatized for enterprises. Data-driven and knowledge-driven form a hybrid AI model. Small models are also very useful in many cases. There are also models for individuals that can understand individual preferences.

It will not be a large model based entirely on cloud testing, but a hybrid large model that combines end, edge and cloud.