Talk about how to think about big models with deep learning scientist Yann LeCun

Talking about how to think about big models with deep learning scientist Yann LeCun

2024-08-09

With the advancement and popularization of generative AI technology in the past two years, using large models to generate content has gradually become a part of ordinary people's lives. This process seems easy: when we input an instruction, the large model can directly output the answer for us. However, behind this, no one knows the internal working principle of the model and the model decision-making process, which is the well-known "machine learning black box".

Because of the unexplainability of black box models, the safety of AI has always been questioned. So scientists began to try to open the black box of large models, which is called "white box research" in the industry. On the one hand, the research of white box models can help people understand black box models, thereby optimizing large models and improving their efficiency. On the other hand, the goal of white box research is to push AI, an engineering discipline, towards science.

This time, we invitedYubei Chen, Assistant Professor of Electrical and Computer Engineering at the University of California, Davis, his research content is related to "white box model". In addition, he is also a postdoctoral fellow of Yann LeCun, Turing Award winner and Meta chief scientist. In this episode, he talked to us about the latest research progress of white box model, and also shared with us the scientist Yann LeCun who has experienced the ups and downs of the AI industry but is still purely focused.

Graphic by Violet Dashi. Illustrations by Nadia and Simple Line

The following are some selected interviews

01 Human brain and large model

Silicon Valley 101:Could you please briefly introduce the "white box model" research you are doing? During your research, have you found any way to explain the input and output problems of GPT?

Chen Yubei:In fact, a relatively large goal of this direction is to promote deep learning from a purely empirical discipline to a scientific discipline, or to turn engineering into science, because engineering is currently developing rapidly but science is relatively slow. In the past, there was a model called word embedding, which can learn some representations of language.

At that time, everyone actually had a question: our performance in doing tasks has improved, but what exactly caused this improvement? So we did a very early work at that time, which was to try to open up these representations of vocabulary. When you open it up, you will find some very interesting phenomena.

For example, for the word apple, you can find some meta-meanings in it, such as one meaning may mean fruit, another meaning means dessert, and if you dig deeper, you will find the meanings of technology and products, of course referring to Apple's products. So you will find that you can find these meta-meanings along the way of a word, and then you can extend this method to the large language model.

That is to say, after we have learned a large language model, we can look for some meta-meanings in the model and try to open it. You will find that a large language model actually has many layers.

In the primary layer, there is a phenomenon called "word disambiguation". For example, in English, there is a word called "left". This word has the meaning of turning left and the past tense meaning of leaving. Its specific meaning depends on the context before and after the context. Therefore, the large language model completes word disambiguation in the early layers.

In the middle stage, you will find some new meanings. At that time, we thought it was very interesting to have a thing called "unit conversion". Once you want to convert kilometers to miles or the temperature from Fahrenheit to Celsius, this meaning will be activated, and you can find many similar meta-meanings along this path.

When you go further up, you will even find that there is a pattern in these meta-meanings, which is that when a repeated meaning appears in the context, it will be activated. You can use this method to open the large language model and the small language model. Of course, these ideas are not completely new. They actually have a history in visual models. For example, there have been some similar explorations since Matthew Zeiler.

Silicon Valley 101:Following this line of thought, if we know how it partially works, can we optimize it a lot from an engineering perspective?

Chen Yubei:Yes, this is a very good question. I think a relatively high requirement for any theory is to be able to guide practice. So when we were working on language models and vocabulary representations, one of our goals was to see if we could optimize these models after we understood them. In fact, we can.

For example, if you find a meta-meaning in a large language model, and it activates when it sees a certain meta-meaning, then this neuron can be used as a discriminator, and you can use this thing to do some tasks. By changing these meta-meanings, you can adjust the bias of the model.

If I can find it, I can adjust it. Anthropic has recently done a similar job, which is to find some biases that may exist in the language model, and then make some changes to it to make the model more fair and safe.

Silicon Valley 101:I saw a study by OpenAI last year that used GPT4 to explain GPT2 and see how GPT2 actually works. For example, they found that when GPT 2's neurons were answering questions about all things related to American history around 1800, the 12th neuron in the 5th row would be activated, and when answering Chinese, the 13th neuron in the 12th row would be activated.

If you shut down the neurons that answer Chinese, its ability to understand Chinese will drop significantly. But the further neurons are, for example, when the neurons reach around 2,000 rows, its overall credibility has dropped a lot. Have you noticed their research?

OpenAI's research: Let GPT4 explain GPT2's neurons

Chen Yubei:I haven't read this article yet, but this method is very much like performing surgery on the neurons in the brain. It is equivalent to if there is a neural network now, this network means that in a sense, it can be found to exist locally rather than being completely scattered, so some operations can be performed on it. For example, if a certain neuron is cut off, then you can assume that a certain part of its ability is relatively lost.
It is actually the same for humans. For example, a person with epilepsy may have some language barriers after surgery, but it does not affect other body functions too much. This seems to be similar in principle.

Silicon Valley 101:OpenAI and Anthropic are now studying the interpretability of large models. Is there any difference between your research and theirs?

Chen Yubei:In fact, no one knows whether the research on white-box models will be successful in the future. I have discussed this with my supervisor before, but we all agree that it is worth trying. If we go back to this point, our research is actually to understand artificial intelligence and reconstruct it through our understanding, and then fundamentally build something different. So observation, that is, explainability, I think is just a means.
In other words, whether it is opening this model, doing these experiments, or making some adjustments to the model, I think these are all some of the means we try in the process of understanding, but the real importance of the white box model is to return to the signal itself. Because whether it is the human brain or the machine, the essence of their learning is due to the signal.

There are some structures in our world, and they also need to learn through these structures, and they learn these structures. So can we find the laws behind these structures, and some mathematical tools to represent them, and then reorganize these things to build a different model? If this can be done, I think it will bring about an expectation of improving the robustness, security and reliability of our system.
In addition, its efficiency will also be improved. This is a bit like the steam engine first appeared before the theory of thermodynamics, which supported its transformation from a purely craftsmanship discipline to a science. Similarly, today we seem to have a steam engine in data for the first time. From not understanding our data before, we can finally start to make some AI algorithms to capture the patterns in the data.

Silicon Valley 101:So it will be more energy efficient.

Chen Yubei:When it comes to energy conservation, I can give you a few interesting examples. The first point is definitely energy conservation, because the brain is equivalent to a 20-watt light bulb, while today's supercomputers may consume more than a million watts.

The second point is that if we look at the evolution of various organisms in nature, their evolutionary efficiency is actually very high. For example, there is a special spider called Jumping Spider, which has only a few million neurons, but it can make very complex three-dimensional swarm lines to catch its prey.

Jumping spider, Wikipedia

I think the most interesting thing is the efficiency of people using data. Llama3's current data volume has reached about 13 trillion tokens. But how much data can a person receive in his lifetime? Assuming that we can obtain 30 frames of images per second, 12 hours a day, for 20 years, we can get about 10 billion tokens. The amount of data that can be obtained for text is almost the same, and the amount of data is much smaller than that of the large model.
So the question is, how can humans obtain such a strong generalization ability with such a small amount of data? This is what I find amazing about the efficiency of the human brain.

Silicon Valley 101:Which is harder, uncovering how the big model works or uncovering how the human brain works? Both sound hard to me.

Chen Yubei:Each has its own difficulties, but the approach is similar. Whether it is the human brain or a large language model, we are trying to observe it and see what it responds to.

This method can actually be seen in the research on the visual cortex by David Hubel and Torsten Weisel, Nobel Prize winners in physiology in the 1980s. They found a type of Simple Cell and tried to study the impulses of these neurons when people see something. They analyzed the different response states of neurons when seeing different things, such as when they were completely unresponsive and when they were very excited. Then they found the receptive field of the neuron.

DH Hubel and TN Wiesel, Nobel Prize winners in Physiology or Medicine in 1981

Today, our study of large language models is similar, finding different inputs and then understanding which neurons in the model are interested in which inputs. However, they are still different.

The first difference is that there are many limitations when observing the human brain, whether through inserting electrodes or brain-computer interfaces. However, a natural advantage of large language models is that the means of observation are no longer limited. If you have a better method, you can analyze it over a long period of time, and you can even further analyze the model through some differential methods.

But its shortcoming is that the capabilities of the large model are far inferior to those of the brain, especially the large language model, because it only learns about the world from language, so its understanding of the world is incomplete, just like a person who has no other senses but only language.

In contrast, the brain can process more dimensional signals and has a very rich sense. Sometimes we wonder whether language is complete. If there is no support from other senses, can all concepts in language exist independently, or do they need the support of other senses to achieve true understanding?

For example, if the thing called “refrigerator” is not associated with the feeling of hot or cold in the real world, but is only described as having statistical features such as a door, then this description is incomplete.

Silicon Valley 101:So actually, compared with the brain, the current large model still lacks a lot. But because we can disassemble it and study it, you think it will go a little further than the ambition of uncovering the secrets of the brain.

Chen Yubei:The difficulty in understanding a large language model lies in the fact that the more means you have to observe it, the more you can understand it. For example, if there are two machines, one is fully observable and the other is partially observable, then intuitively the fully observable machine is easier to understand. Of course, it has some capabilities that this machine does not have, so it cannot replace some understanding of the human brain.

Silicon Valley 101:I would like to add that Yubei studied neuroscience before. Do you think your academic background is helpful for your current research in AI? Are there any interdisciplinary research methods that can be borrowed from each other?

Chen Yubei:I didn’t major in computational neuroscience. I majored in electronics at Tsinghua University and in electrical engineering and computer science at Berkeley. But the research institute I was in at the time was a neuroscience institute, so my advisor was an expert in computational neuroscience.

Regarding the question just now, I think that learning neuroscience is usually helpful to me in the form of inspiration, because when you know these systems in nature and what they can do, you may have different ideas and look at the problems in front of you in a new way.

For example, a picture is a two-dimensional input signal, and its pixels are horizontal and vertical, and then they form a grid. But the human retina does not look like this. First of all, it is a kind of receptor with different perceptions. This receptor is arranged in a very dense but not very regular way. It is very dense in the middle and becomes sparse towards the sides.
When you face such an input signal, first of all, the convolutional neural network that we are used to will fail, because even convolution is not defined here. So when you see this situation in biological systems, you will rethink where these so-called convolutions come from.

Silicon Valley 101:So you will rethink whether the method is correct? Is it necessary to achieve it in this way?

Chen Yubei:Yes. Suppose one day you wake up and all your neurons are disrupted, can you still understand the world? Because what you see is no longer a picture, and you can no longer use convolutional neural networks to do this. What kind of method do you need?

Although we haven't completely solved this problem, we have actually taken a step forward. Although all my neurons are disrupted, that is, the pixels in our sensor image are disrupted, there are some relationships between these adjacent pixels. For example, when we look at an image, I will find that if a pixel is red, the surrounding pixels are more likely to be red. Then, through this relationship, you can let these pixels find friends again, and then similar pixels can be self-organized into some relationships.

Then, by adding the Transformer structure in the large language model, we can re-represent the image, and the performance of this representation is quite good. This is an example of completely re-examining some of our current engineering practices based on inspiration from nature and then proposing some different methods.

Black Box Model, AIGC image via Firefly

Silicon Valley 101:I feel that there are many similarities between studying big AI models and human brain neuroscience. Will there be neuroscientists who would like to collaborate with you on cross-disciplinary research from their perspective?

Chen Yubei:In fact, there are many neuroscientists, statisticians, and mathematicians who want to understand some structures in natural signals. They also pay attention to how neurons in the brain work, and then combine the two to try to come up with some minimalist representations of signals.

For example, you will find a phenomenon in the brain that although there are many neurons, the neurons that work at the same time are actually very sparse. For example, there are 1 million neurons, but there may be only a few thousand working.

Based on this, a sparse coding method was proposed in the early years of neuroscience. That is, can we find some sparse low-dimensional representations in such high-order signals? The algorithm constructed based on this idea is very similar to the neuronal representations observed in the brain, so this is a success of unsupervised early computational neuroscience.

As of today, our entire research field is called Natural Signal Statistics, and its goal is to reveal some basic structures behind the signal. However, compared with large models, the development of research combined with neuroscience such as white-box models is relatively slow. I think it may be because the problem is complicated, but also because there are relatively few people working in this direction.

02 Black box model’s “overtaking on a curve”

Silicon Valley 101:Simply put, there are too few people studying white-box models now. But before the emergence of big models, does traditional machine learning also fall into the category of white-box model research?

Chen Yubei:I think this statement can be considered correct. The previous machine learning models were relatively simple and relatively understandable.

Silicon Valley 101:Then why has the research progress of the black box model surpassed that of the white box model and is so much faster?

Chen Yubei:When this question is asked, we will be nervous for a moment and then answer it.

Silicon Valley 101:Why be nervous?

Chen Yubei:Because this question is very sharp, it is actually asking whether we should give up the white box model, or the understandable path. From our time on, have we stopped studying science in the field of AI, and will it all become an empirical discipline? But I don't think so yet.
Back to your question, what happened in this process? First of all, the black box model has less baggage. If you want this method to work and be explainable, it is too much, so the black box model gives up one thing to make it work first.

The second reason, which is relatively overlooked by everyone, is the counter-cyclical growth of data, or the expansion of scale.

Richard Sutton mentioned in a blog post that in the past 20 years, there is one thing that has not been broken, that is, when we have more data and more computing, we should find algorithms that can be truly expanded to find the laws of all data. I think this is a very important point in the black box model, or in our current empirical progress.

That is to say, when we have bigger data, better data, more computing, and bigger models, we can learn more. But if we go back to this question, everyone in the white box model has a pursuit, that is, the model itself must be simple.

A comparison between Black Box ML and White Box ML

Silicon Valley 101:Why should the white box model be simple? Can it be understood that if it is too complicated, it will be difficult to design?
Chen Yubei:Yes. In fact, only concise things can be understood when doing theory, and simplification must be done again and again. However, when we pursue the simplicity of the model, we may also make over-simplifications again and again. Once such over-simplification occurs, the model cannot fully describe the form of the data. Then when there is more data, the model will not be able to continue, and its ability will be limited.

So I think this is also a difficulty that people faced when studying white-box models and simple models in the past. We not only need to carry the burden of the model needing to work and be explainable, but also need it to be simple. When you carry all these things, you will find that the burden is too heavy. When you over-simplify, you will introduce errors, and the errors will accumulate, and then you will be unable to move forward.
Silicon Valley 101:But now with the rapid development of black box models, we are beginning to try to solve it again.
Chen Yubei: Yes. And this time when we solve it, we may re-examine this issue. We don’t necessarily need to simplify the model to that extent, it can still represent the more complex side of the world.

But at the same time, we still hope that it is more understandable, so if one day we can achieve a white box model, then I think every attempt before that is an over-simplification, but we hope that every simplification can move forward. We don’t even need to make a completely white box model, maybe we can make a white box model that is not as powerful as a large model, but relatively speaking, it is very simple.
It helps us understand the nature behind learning, and this understanding may in turn help us improve the efficiency of training large models. I have discussed the efficiency issue with Yann several times before, that is, if the theory behind it is developed, we may be able to increase the efficiency of engineering practice by orders of magnitude.
Silicon Valley 101:Yann, is it more desirable to develop a white box model or a black box model?
Chen Yubei:Yann is a scientist known for his engineering skills, so many of his attempts are to make things work first. But Yann also supports white box model research. During my discussion with him, he felt that this path was worth exploring, but he didn’t know whether it was an overly ambitious goal that could be achieved, but someone had to do it.
Silicon Valley 101:I feel that the black box model is an engineering problem, while the white box model must be explained scientifically. Although from a commercial perspective, its input-output ratio is not that high, if this thing can be finally made, it will still be very valuable for the security of AI and future commercial applications.
Chen Yubei:Regarding commercialization, I think that the original intention of all people doing basic AI research is not for any application, but driven by pure curiosity about the issue of intelligence, and then they may find some rules that may help in engineering practice. The research itself is not designed for a certain application.

In addition, when we are pursuing this white box model and this extreme efficiency, we will also ask a question, that is, can the large language model we are currently working on only be achieved through this scaling or scaling law? I think it is not the case. Because people cannot accept such a large amount of data, how to use a small amount of data to obtain a relatively high generalization ability is also an important issue we are studying.

Silicon Valley 101:This should also be a problem that scholars of black box models are studying. Which scholars and schools of thought of white box models are currently studying this issue?

Chen Yubei:There are three main forces in AI. The first force is the experience we gain from studying these engineering models and then visualizing them, such as what Anthropic and OpenAI are doing recently.

Anthropic Research: Extracting Interpretable Features from the Claude 3 Sonnet Neural Network

The second is computational neuroscience, which attempts to understand the human brain and find some ways in which memory may exist.

There is also a school of thought that looks at the basic structure of the signal from a mathematical and statistical perspective. Of course, there are many intersections between these three.
Silicon Valley 101:Which school do you belong to?
Chen Yubei:In fact, I have been influenced by all three schools of thought to some extent. When I was at Berkeley, my mentor and Professor Ma Yi were both from the neuroscience and mathematical statistics school, and Yann was more trained in engineering. I think all three methods are acceptable because they will eventually lead us in the same direction.
Silicon Valley 101:Which direction is the same? Are there any interim results now?
Chen Yubei:The ultimate goal is to understand the model. We have achieved some interim results before, such as whether we can make some networks with even two or three layers, and we can see what each layer has learned. Finally, we found that it is really possible to represent a number by learning all its strokes, and then linking similar strokes together, and then we can construct the next level of representation, layer by layer, and finally find the number.
Silicon Valley 101:Will your current research lead to optimization of the black box model?

Chen Yubei:First, when you have a deeper understanding of it, you may be able to optimize the black box model and make it more efficient. Second, you can unify different black box models, which reduces a lot of unnecessary waste. At the same time, another pillar of my laboratory is to study not only perception but also control.

When you give these large language models the ability to interact with the world, can you get the same generalization ability in the control system? What does that mean? In the perception system, you will find that I have learned apples, pears, and then peaches. Since I have learned the similar concepts of apples and pears before, I can quickly learn the concept of peaches.

So, can we achieve similar performance in the field of control? For example, if a robot learns to walk forward and jump in place, can we quickly turn it into a robot that can walk and jump forward at the same time?

Silicon Valley 101: If you were asked to give a conclusion, where do you think the current progress of using white box model research to unlock the secrets of how big models work is?
Chen Yubei:In fact, we don’t know how long this progress bar is. I feel that we are actually very far away from this goal. It is not necessarily a linear development, but may be more like a quantum leap. When a new cognition comes out, you may immediately take a big step forward.

If you want to make a white-box ChatGPT, I think this is still a long way off, but we may be able to make a good, fully understandable model that can replicate the capabilities of AlexNet at the time. This model can do Imagenet recognition, and we can understand how it works at each step, how it becomes a cat and a dog step by step, and how the structure of the cat and the dog is generated.

Example of WordNet used by ImageNet

Silicon Valley 101:Is ImageNet recognition a white box or a black box?

Chen Yubei:We haven’t figured out exactly how it works yet. We have some understanding from some early visualizations by Matthew Zeiler and Rob Fergus and a lot of researchers, but no one has been able to create a model that we understand every step of the way and that works well.
Silicon Valley 101:So the goal of the white-box model may be to have phased approaches. For example, the first step is to explain how ImageNet works. After the mystery is solved, we can then explain how some small models work, just like using GPT 4 to explain how GPT 2 works, and then slowly explain how the large model works.
Chen Yubei:Yes. I think this process will take quite a long time, and more people will be needed to invest in this direction. Because most of the work is currently concentrated in the engineering field. If we put it in schools, then you actually need to have some original ideas, instead of saying you scale and I scale, then everyone scales, and in the end there is actually no differentiation, it just depends on who has the best machine and who has the most data.

03 What I know about Yann LeCun

Silicon Valley 101:Next, I would like to discuss with you your postdoctoral advisor Yann LeCun. Let me first introduce Yann LeCun. His Chinese name is Yang Likun. He is a French computer scientist who has made many contributions in the fields of machine learning, computer vision, mobile robots and computational neuroscience. He is known as the "father of convolutional neural networks."

LeCun is currently the chief AI scientist at Meta and a professor at New York University. He pioneered the convolutional neural network (CNN) in the 1980s, a technology that became the basis of modern computer vision. LeCun, Geoffrey Hinton and Yoshua Bengio jointly won the 2018 Turing Award for their pioneering work in deep learning.
Could you please explain to our non-tech-savvy friends a little bit about Yann’s main scientific research results and why he is so famous?

Chen Yubei:Yann has been researching the field of neural network AI since the 1980s. He has experienced many ups and downs and the decline of different schools of thought, but he has always insisted on deep learning networks and is a person who has walked through the darkness.

For example, in 2000, it was very difficult to publish articles related to deep learning. How difficult was it? If your article contained the word "Neural" or "Network", the probability of your manuscript being rejected was very high. If it contained the word "Neural Network", it was basically certain to be rejected.

So it was a dark moment for them at that time, and their funding was also affected. But they persisted in this darkness and finally walked out of it. Today, neural deep networks have changed the world. I think this is actually a memory of their Turing Award and their early pioneers.

Yann LeCun

Silicon Valley 101:Why did you choose his group when you were doing your postdoctoral studies?
Chen Yubei:This was an interesting adventure. I was actually quite confused at the time and didn’t even think about graduating that semester. I was determined to create a white-box model during my PhD and have it comparable to AlexNet’s performance, but I was still a little short of it.

I thought if I want to continue my research, who should I go to for a postdoc? I was at a meeting at the time, and I met Yann at the meeting. I am not a very opportunistic person, and I think everyone must want to find Yann as a postdoc, so when I met him, I mainly wanted to talk about his views on my work and his views on the future research direction of AI.

We had a very good chat at the conference. He had also thought about my research direction and some of the problems I was thinking about, but from the perspective of neural networks. So he asked me if I was interested in applying for a postdoctoral position, and of course I applied. So we hit it off right away.

Silicon Valley 101:What kind of mentor is he? Is he the type who gives students a lot of freedom to explore, or is he the type who helps a lot by discussing with everyone?
Chen Yubei:first，The second situation is no longer possible for him because many people need his time and he has relatively limited time to allocate to each person.

He is actually similar to my doctoral advisor in that he is very laissez-faire in some general directions, but I think they are similar in that they are persistent in what they believe in, that is, he may give you a direction and a goal, but it doesn't matter how to get there, whether by boat or by car, he won't control these details.
His general direction has not changed over the years, which is self-supervised learning. Self-supervised learning is actually divided into two parts. One part is self-supervision based on perception. The other more important part is how to use an embodied way to do self-supervision, or what we are doing now is the world model. This is a direction he believes in.

I was actually the one who recommended this name to him because I read an article called World Model written by David Ha and Jürgen Schmidhuber, and I thought this name was quite cool.

A system architecture for autonomous intelligence, Mata AI

Silicon Valley 101:Do you think Yann's research direction is different from that of OpenAI and Anthropic?
Chen Yubei:If there is any difference, I think Yann wants the model to have several characteristics. The first is the ability to be embodied, which means that it is not just a pile of data, but the model can eventually explore the world on its own.
Silicon Valley 101:What's the difference? It seems that everyone hopes to achieve this result in the end.
Chen Yubei:The execution methods are different. For example, I think OpenAI uses the Scaling Law, which means more and better data, more computing, and bigger models. But Yann is more scientific. He thinks that if we want to truly move towards human-like intelligence, what do we need? He thinks that just piling up data is not enough.
Silicon Valley 101:So Yann is actually equivalent to studying both black box and white box together.

Chen Yubei:I think Yann doesn't actually care that much about whether this can develop into a science. At present, I think his views are still mainly empirical and engineering. He hopes that the system can work better, which is actually something he has always been very good at.

Silicon Valley 101:When OpenAI proved that Scaling Law can achieve good results, do you think Yann has changed his scientific research methods and thinking? Or is he still very persistent in his original route?

Chen Yubei:In fact, he is not against the Scaling Law. I don’t think there is any conflict on this issue. The real possible disagreement is that a lot of OpenAI’s work is still product-oriented and it executes engineering to the extreme, but Yann is actually conducting research in a more scientific form.

When he thinks about these issues, he doesn't really care about the product, but only thinks about how to achieve intelligence. Because he has been in this field for so long, he started to work in this field in the 1980s, so when he looks at these issues, he may still stick to his ideals.

Silicon Valley 101:Allowing intelligence to learn autonomously is the first feature of Yann’s research. What other features are there?

Chen Yubei:Another thing Yann has always believed in is JEPA, which stands for Joint Embedding Predictive Architecture. This means that the model must have the ability to learn autonomously, but more importantly, it must also be able to learn some higher-level rules when learning data.

In fact, there are currently two camps. One camp hopes to completely reconstruct the data through learning, which can be considered a compression idea, but Yann does not want to completely return to the image, because the reconstruction of the image contains too many details, and these details are not the most important information when making judgments on the system.

Silicon Valley 101:Does his opinion on this point differ from that of your professor Ma Yi at Berkeley?

Chen Yubei:In fact, they do not have any essential conflict on this point of view, but they just express it differently. Mr. Ma thinks that the rules of this world are simple, while Yann thinks that these details are actually not conducive to downstream tasks or some judgments, so we need to find those high-level rules.

In fact, the two are the same, because the high-level rules are generally concise. Teacher Ma often says that everything is compression. If you take Yann's point of view, you will find that compression is indeed correct, but the hierarchical structure of the data is actually different.

Because the real world is complex, if you delve into the details in the real world, you will find that a lot of things are actually some low-level structures. There is structure in the data, and anything with structure is a reaction from the deviation from noise. In other words, anything without structure is noise, and anything away from noise is structure.

We need to learn these structures, but there are different levels of structure. But when you go up a level, at a larger scale, you will find that the structure is actually not important anymore. At that level, these things have become like noise.

So Yann's point is that compression is right, but we need to have such a hierarchical learning, learning all the structures in the signal, and learning higher and higher structures. However, the highest-level structure often does not account for a large proportion of the entire compression, and it may be lost in the optimization process, because a lot of things are at the low level, and the amount of information like noise is the largest. The higher you go, the harder it is to find such a structure.

Why? Because in your optimized loss function, which is your objective function, whether you find this rule or not may have little impact on your loss. I think it mainly comes down to these two points: one is the world model, and the other is the hierarchical representation.

Yann LeCun speaking at NYU

Silicon Valley 101:What qualities do you think are particularly appealing to you about them?

Chen Yubei:What impressed me most was probably the concentration and purity with which they did things.

One time I had lunch with Yann, and he said, "I have everything you wanted when you were young, but I don't have much time anymore, so I can only use the remaining time to do what I really believe in."

When you work with such scientists, you may be influenced by their temperament, so that even if you have not yet reached their current position and have the things they have, you can still see the world from their perspective.

So when you make choices or do things, you may go beyond your current position and think, what would I do if one day I had everything like him?

Silicon Valley 101:Did he change any of your decisions?

Chen Yubei:Yes, it makes me think about this when I make a lot of choices. I remember my first day of doctoral studies, my supervisor told me two things.

One is that he doesn't need me to publish a lot of articles, but he hopes that the articles I publish can transcend time, that is, 20 years later, they will still be fresh. This is actually very difficult, because many works have a distinct sense of the times, but some truly profound thoughts may still be fresh after hundreds of years. This is a very high goal, and it may only be verified when you are about to retire. But it raises a soul-searching question, that is, can you persist in doing work that can coexist with time.

Secondly, he hopes that a scholar should have his own attitude. If you think that something can be done by A, B, or you can do it, then don’t do it. That is to say, when you do this thing, you will find that it is not the job that needs you, but you need the job. This is a speculative mentality. This is actually the similar temperament I see in them, that is, I hope not to follow the crowd, to have your own attitude and find your own voice.

So when I am choosing the direction of my research, I will judge from time to time whether the work I am doing now is speculative or a real and solid job.

I think the great thing about them, especially Yann, is that they can go through this almost desperate time and then see the dawn. People who have not experienced the trough may not have enough precipitation. When you go through the darkest moment, use your vision and persistence to go through this short period of time, and then prove that it is right. I think this is a very interesting temperament.

Silicon Valley 101:Are there any scientific opinions Yann expresses that you disagree with?

Chen Yubei:Sometimes he makes blunt statements. For example, he recently said that as a researcher, you should not study large language models. This statement can be interpreted in many ways. If you interpret it literally, many people will disagree, including me. I may think that there are some structures in large language models that are worth understanding and studying.

Of course, what Yann really wants to say is what I just mentioned, that is, don’t do speculative work that can be done by A or B. I hope that researchers can have their own persistence and make more original contributions. If that’s what he said, I actually think I would agree more. But as a big V, sometimes what he said would scare you and cause a lot of discussion. It’s a very interesting point for me.

Silicon Valley 101:You also worked at Meta. What do you think is Yann’s greatest contribution to Meta?

Chen Yubei:First of all, he helped to set up Meta AI. When he was setting up Meta AI, Mark was the first to find him. In addition, because he worked at Bell Labs in his early years, he longed for the state of Bell Labs back then, so he also had an ideal to replicate such a laboratory in Meta. He adhered to this concept and recruited and trained a group of very good people in Meta AI, making great contributions to this field and promoting the development of the entire field.

Silicon Valley 101:I think open source should be considered one of his very important contributions. For example, the reason why Meta llama took the open source route should be very consistent with the entire Yarn concept.

Chen Yubei:Yes, open source is what Yann insists on. But I don't know whether Meta will continue to be open source in the future, because after all, Meta will also face competition. But I think this is Yann's idea. How well it can be implemented and how far it can go actually depends on the development of the entire environment.

Silicon Valley 101:Do you think that the entire big model research must be driven by scientists now, or will it gradually become an engineering-driven thing?

Chen Yubei:I think it has become an engineering-driven process. In the early days, it was driven by scientists. In the past two years, I think the main progress has come from the execution of engineering. Has the quality of data improved? Has the amount of data increased? Has its distribution become richer? Can the calculation be parallelized? All of these are caused by very important details in the engineering field. The development from 0 to 1 requires scientific breakthroughs, but from 1 to 100, it requires the rigor and execution of engineering. People with different roles are needed to promote it at different stages.

Silicon Valley 101:Everyone is looking forward to GPT 5 now. If GPT 5 comes out, do you think it will be more of a scientific problem or an engineering problem?

Chen Yubei:I think there is still a long way to go in engineering, and we can even think that Scaling Law still has a long way to go, and it has not reached the end at all, including data quality and computing power expansion. But at the same time, I think even if the most robust path we have found now is Scaling Law, it is definitely not enough.

So what else do we need? I think what we need is some human-like high efficiency. How can we achieve such efficiency? It may be triggered by data, but it may also be something else. So I think if we are talking about the process of leading to AGI, there should be some relatively large changes from 0 to 1.

Silicon Valley 101:That is, we need to make progress in science, but we also have a lot of room for improvement in engineering.

news

Talking about how to think about big models with deep learning scientist Yann LeCun

Introduction

My contact information