2024-08-08
한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina
Editor's note: Tencent Auto's editorial department calls the past decade of electrification the "stormy era" of China's auto industry. Now, standing at the historical node of 2024, which is known as the "first year of intelligent driving", we can't help but ask: What technical route will the major players in the industry adhere to? How will they build their own competitive barriers? Tencent Auto has specially launched a series of intelligent driving planning. Through interviews, actual tests, cross-comparisons, reviews and other methods, it strives to stand at the origin of history and further understand the huge changes that may occur in the auto industry in the next ten years, so as to provide readers and the industry with a more comprehensive content guide, which can leave a valuable historical footnote for the industry.
Tencent News "High Beam"
Author Ao Dun
Edited by Shi Ding
The "end-to-end" (E2E) solution is becoming the industry-recognized optimal solution for intelligent driving. However, when people try to clear the fog and find the truth, there seem to be ten thousand types of "end-to-end" in the eyes of ten thousand car companies.
Last December,TeslaWith the launch of FSD V12 and the proposal of an end-to-end solution, the term end-to-end was highly praised by the intelligent driving circle overnight; then, Huawei,Xiaopeng,horizon,NIOCompanies such as Xpeng and others have successively proposed end-to-end solutions. At the end of July, He Xiaopeng, chairman and CEO of Xpeng Motors, said that Xpeng Motors is one of only two car companies in the world to achieve end-to-end large-scale model mass production.
July 5,Ideal AutoA new autonomous driving technology architecture based on end-to-end models, VLM visual language models and world models was released, and the first version was pushed to thousands of test users at the end of July. Li Xiang announced in June this year that the plan will be fully implemented as early as this year or at the latest in the first half of next year.
According to Lang Xianpeng, vice president of intelligent driving R&D at Ideal Auto, the above-mentioned architectural design was inspired by the fast-slow system theory mentioned by Nobel Prize winner Daniel Kahneman in "Thinking, Fast and Slow". It simulates human thinking and decision-making processes in the field of autonomous driving to form a smarter and more humanized driving solution.
The fast system, i.e., System 1, is implemented by an end-to-end model, receives sensor input, and directly outputs the driving trajectory for vehicle control. The slow system, i.e., System 2, is implemented by the VLM visual language model, which receives sensor input, performs logical thinking, and outputs decision information to System 1. The autonomous driving capability composed of the dual systems will also be trained and verified in the cloud using the world model.
In a horizontal comparison with competitors in the industry, Lang Xianpeng emphasized that Ideal Auto's end-to-end model is the first One Model end-to-end model, which is very different from other segmented models. "One Model is a sensor data output that directly outputs the trajectory without any other rules or models in between. Other end-to-end models may still need to use some rules to string them together."
From public information, the industry believes that Xpeng Motors and Huawei are segmented end-to-end. The end-to-end large model of Xpeng Motors' mass-produced vehicles is composed of the neural network XNet + the large regulatory control model XPlanner + the large language model XBrain; the perception part of Huawei's end-to-end system uses the GOD (General Object Detection) large perception network, and the decision-making and planning part uses the PDP (Prediction-Decision Planning) network to realize the pre-decision and planning network.
In the past, intelligent driving systems can be divided into several main modules: perception, prediction, planning, and control. Each module is responsible for different tasks. This is also known as the era of autonomous driving rules. Today, the popular end-to-end is, to be precise, a large AI model that can directly generate output results from raw input data through deep learning technology. The input data is the data obtained by sensors such as cameras and lidars, and the output results are driving behaviors such as acceleration, deceleration, and braking.
In practice, the above ideas cannot be achieved in one step. On the technical side, a series of complex issues such as model architecture, data, and engineering verification need to be resolved. From the user experience perspective, the ultimate goal of the end-to-end solution is to get as close to human "experienced drivers" as possible, or even surpass experienced drivers by continuously exploring their capabilities. However, there is no consensus in the industry on when this goal can be achieved.
Jia Peng, head of the R&D of intelligent driving technology at Ideal Auto, believes that we have now reached the uncharted territory of intelligent driving. “No one has explained how end-to-end is done, and we are like ‘blind men touching an elephant’.” However, he believes that the meaning of end-to-end is not just one or two models. Its greatest significance is that it fundamentally changes the entire R&D process. “Only with a groundbreaking AI process can your system truly have human-like driving capabilities.”
Although end-to-end is considered the optimal solution for intelligent driving, it is still in its early stages and there is no clear definition or evaluation standard in the industry. Lang Xianpeng believes that domestic automakers are currently on the same starting line in terms of end-to-end, but if we look at the One Model, Ideal may be a little ahead. In his opinion, Ideal Auto's end-to-end solution is the best solution for realizing artificial intelligence in the physical world at present, "because it simulates human cognition and thinking mechanisms very well, and truly enables the system to have the ability to think and understand the world like humans. This is the advantage of the dual system."
In the longer term, end-to-end may be a competition of financial strength. Lang Xianpeng believes that sooner or later, the top players will definitely move towards end-to-end, and the gap will definitely be widened. In the real era of artificial intelligence, everyone will compete on two things: first, whether there is enough high-quality data; second, whether there is sufficient training computing cluster to match it.
"In the end, everyone competes on computing power and data, but the threshold for these two things is very high. If the financial reserves are not enough to support the annual training expenses, you will not be able to play the later L3 or L4. If the car companies have a small number of cars, the data will not be able to support the training needs." Lang Xianpeng said that according to preliminary estimates, Ideal currently spends 1 billion yuan on training each year, and it is expected to spend 1 billion US dollars per year in the future. "This is only the cost of training computing power, and does not include other personnel expenses. So if you can't spend 1 billion US dollars a year on training, you may be eliminated in the future competition of autonomous driving."
The following are edited excerpts from a conversation between Tencent News' "High Beam" and other media outlets and Lang Xianpeng and Jia Peng:
Q: What is the opportunity to shift from traditional intelligent driving technology to end-to-end technology? What are the advantages and disadvantages of Ideal's technical solution compared with Tesla, Huawei and Xiaopeng?
Lang Xianpeng:As for the advantages of this system architecture solution, we have to start with our thinking about autonomous driving in August and September last year. In the past year, we have done three generations of technology research and development, from highways to urban autonomous driving. In cities, we first used NPN (Neural Prior Net) solutions with scenarios, then switched to the current map-free solution, and then iterated to the current end-to-end solution.
During this process, we found that this solution still has a very big flaw or problem for the later L3 and L4 level autonomous driving. The problem is that we humans can understand unknown scenes and places we have never been to, and we just need to adapt to normal driving. But whether it is the current end-to-end solution or the map-free solution, it is essentially still the scenes that have been seen or the data that has been trained that can perform better. If it is a new scene, it may not be handled correctly, but if we want to completely let the system replace people to drive the car, then our system must have the ability to deal with unknown scenes like humans.
Let's take a simple example, our traffic lights. The traffic lights in Tianjin are different from those in other places. The traffic lights in Tianjin are progress bar-style traffic lights, but the traffic lights in other places are either light bulb-like or countdown-like, which is easy to understand. But I believe that anyone with common sense who goes to Tianjin will think that it is a traffic light as soon as you see such a thing at the intersection, and stop and start normally according to the instructions of the traffic light. So we need to let the system also have this kind of understanding of the scene, or the ability to logically reason about this kind of knowledge. How to obtain this ability? At this time, we see the theory of dual systems, which is a good explanation of the mechanism of human cognition. The fast system makes timely processing responses, and the slow system corresponds to complex thinking and logical judgment. The dual systems together constitute the mechanism of human cognition and thinking, so we want to think about how to apply the theory of this system to autonomous driving.
So what is used to achieve systematization of system one? We finally chose to implement it with the system application end-to-end model, and system two with the VLM visual language large model. These are our two specific implementation methods. After preliminary research and development, we have now implemented these two systems in our real mass production vehicles.We believe that it is currently the best solution for realizing artificial intelligence in the physical world, because it simulates human cognition and thinking mechanisms very well, and truly enables the system to have the ability to think and understand the world like humans. This is the advantage of the dual system.
Our dual system has some unique features. First, our end-to-end model is the first One Model end-to-end model, which is very different from other segmented models.Second, our VLM model is the first model that can be deployed and mass-produced on the vehicle side. Other models may be trained and tested on their own training clusters, but we are the first to use the mass-produced vehicle-side chip Orin X to optimize and deploy it on the vehicle. Moreover, this model is large enough, with 2.2 billion parameters, which is already a large model in a practical sense. We are also the first to propose and implement this dual system. From system architecture to system implementation, we have some advantages and characteristics of our own.
Q: With an end-to-end visual and speech model, can this solution support the development of L3 and L4?
Lang Xianpeng:At least from the current perspective, I think the method should be possible, but whether it is end-to-end plus VLM models, or two models combined into one, or a model with larger parameters, or other structures, I think it can be iterated slowly, but I think the overall idea should be possible.
Question: How do System 1 and System 2 divide their work?
Jia Peng:We have two models, two Orin Xs. One is an end-to-end model, which is relatively small, with about 300 to 400 million parameters, and runs at more than 10 Hz. It will control the car at a high frequency because it needs to be controlled in real time. Although VLM has a large number of parameters, it cannot be controlled once every one or two seconds. Now we have optimized it to a quasi-real-time level of about 3.4 Hz, with a delay of about 300 milliseconds. It makes decisions all the time and outputs two decisions, such as whether to slow down or avoid, and the second one will give a reference trajectory, such as whether I am heading towards this lane or that lane. These two pieces of information will be directly fed into the model, and then the results will be output at the same time. This is roughly the structure. System 1 does not completely adopt the opinions of System 2. System 2 is to enhance the decision of System 1.
System 1 plays a major role, and System 2 is only a reference or consultation for special situations. When it comes to L4, System 2 will play a greater role. It does not mean that System 2 is controlling the car all the time, but it really plays a very important role in decision-making and judgment. In some unknown scenarios, the ability of System 2 determines whether it can reach L4, but the basic capabilities of System 1 are a necessary guarantee for L3.
Q: Will the two systems be merged into one in the future?
Jia Peng:This is the next step in our pre-research. In fact, the current idea is to mass-produce two models. Currently, Wutu 6.0 has been opened nationwide. We want the end-to-end + VLM system to be easy to open nationwide. Then, how to make a mass-produced L4 in the future? Our idea may be to make the scale of the model larger, the capacity larger, and at the same time make its frame rate higher. Or if there is a chance, these two models can be combined into one, and whether to go with system one or system two, let the model decide for itself. So if there are chips with greater computing power and better platforms in the future, this can play a huge role.
Q: Why can't VLM be called end-to-end? In my opinion, it is also end-to-end.
Jia Peng:If the computing power is large enough in the future, VLM itself can run in real time, for example, running at more than 10 Hz or even 20 Hz, and perhaps it can also achieve end-to-end fast response, but currently VLM is actually a multi-round question and answer. I want to ask it how do I open it under such working conditions? Why do I open it like this? What are the results after opening it?
Lang Xianpeng:In fact, from our point of view, the term end-to-end means that any pure data-driven model is end-to-end. Its input is data and its output is the result. However, the result is a track in system one and a decision in system two. I would like to emphasize here that there is a big difference between whether end-to-end refers to the end-to-end of multiple models or one model.For example, the One Model we are working on here directly outputs the trajectory of sensor data without any other rules or models in between. Other end-to-end systems may still need to use some rules to string them together.
Q: What is the upper limit of the current system’s capabilities?
Lang Xianpeng:VLM is now at the edge of a no-man's land. To explore further, all companies, including us, must do end-to-end. However, I believe that we are the first company to do this, and we will have our own explorations in the process. As we are doing this, we will find that the performance improvement brought by the data scale has not yet seen the upper limit. We are still exploring the boundary between data improvement and performance improvement, and we have not yet reached it.
We have made an analysis that the computing power of chips is limited, so the parameter scale is also limited. Now we have an end-to-end parameter scale of about 300 million. There is actually an upper limit to how much data training volume it can consume with a parameter scale of about 300 million. It is impossible to feed it indefinitely. That is impossible.
Jia Peng:Although computing power has been improved, the more serious bottleneck of car-side chips for large models is memory bandwidth. We have really reached a relatively uncharted territory, and no one has explained how it is done end-to-end, so we are just "blind men groping in the dark".
Our end-to-end model reaches the trajectory and adds some safety margins after the trajectory. Because before the model reaches the upper limit, there are still some things that need to be processed, such as sudden steering wheel movements, which can be avoided. This is our solution.
Question: How do you define the integrated model as being more powerful and advanced than the segmented model? What is the ultimate ceiling for end-to-end development? Will there be more powerful models in the future?
Lang Xianpeng:First of all, I don't think there is any good or bad, suitable or not. If you want to do L3, 4 or higher level autonomous driving, I think this integrated end-to-end model is the one you must choose, because it is not only about choosing the model itself, it is more about choosing a more advanced iterative or R&D process and method. The segmented and previous models are also very suitable for L2 level assisted driving.
Starting from end to end, the change is not as simple as one model or two models, but the whole thinking, the R&D process and the way of doing things have undergone tremendous changes. There are no rules here. What I can do is to feed it high-quality data, and then train to improve the model's capabilities, so that it can make better plans and decisions.
Then I need to reasonably iterate the framework of the model. The most important way is to find better quality data. The data must be large enough and of good quality. We have basically reached the level of 3 million parameters, and our data selection is very particular. First of all, the people who drive with our product team and our subjective evaluation team are all experienced drivers with excellent driving experience.
They worked with us to develop a set of standards for veteran drivers, such as their safe driving conditions and driving style. After iterating through several dimensions, we used this rule to screen our existing 800,000 car owners. We only wanted those with scores above 90, and this segment needed to be screened.Because we have this base, we can filter out 1 million or 10 million high-quality fragments. On the surface, there may be only 10 million, but in fact, it is possible that these tens of millions of kilometers of data are filtered out from 1.2 billion kilometers of data. This is one reason.
When we screen data, we have our own set of tool chains behind it. It is not just a matter of picking and choosing, but we also have our own proportions and data formulas, which is also critical.
Q: Some companies say that a lot of previous data is useless in the end-to-end era. They are now experiencing the most painful thing, demolishing the old bridges, building new bridges, and building security systems that can test them. What do you think of this statement?
Lang Xianpeng:In my opinion, his statement is contradictory. He said that data is not that important, but his statement also shows that data is very important. In fact, for Ideal, we have long realized this. What is the most important thing for autonomous driving? Is it talent and capital? I think it is data. Without data, there will be no foundation for algorithm training and verification in the future.
We have been accumulating data and building our data platform since we delivered our first car in 2019.Ideal L9At the beginning, we were all nesting dolls, which is extremely beneficial for autonomous driving. All camera specifications and installation locations are consistent. Although there are slight differences in length, we can reuse these data. However, some manufacturers may have sedans and SUVs, and the sensors may also be different, so it may be a challenge for them.
Q: Some people say that end-to-end will simplify the intelligent driving development process and reduce labor costs. What do you think?
Lang Xianpeng:If we use this solution, we really won't need so many people. The entire end-to-end R&D process, to put it simply, is to select data, train models, evaluate models, and use world models. The world model is internally called System 3, which is a test system. The capabilities of System 1 plus System 2 are evaluated and certified by our System 3. However, previously, our evaluation and testing of this type of autonomous driving system was performed by humans, whether it was a large road test or a field test, and humans cannot evaluate everything.
It is impossible to use humans to track the changes in the millions of kilometers of roads across the country throughout the year. It is not like the highways. The highways in Beijing are not much different from those in Guangdong. But in urban areas, they really cannot keep up. So we have a system three, which will help us test the capabilities of system one and system two. After the test, if they pass the iteration and go online, the next round will begin.
In this process, except for the process of developing these system platforms, there are not many people involved in the actual work, which will greatly reduce the use of people. It will also have many benefits for the optimization of management and personnel use within our organization. Therefore, some of our subsequent adjustments are actually based on the changes in this business, not just for the sake of adjustment as everyone imagines.
Q: From the perspective of consumers and users, when end-to-end technology is implemented, what kind of upgrades will the experience experience have?
Lang Xianpeng:From the user's perspective, no matter whether you use end-to-end or other technologies, and regardless of your technical solutions and routes, users just need experience. So when we push the end-to-end plus VLM product to all users in the future, we hope to give users the feeling that a very experienced driver is helping me drive.
Users don't necessarily need to know what this technology is, but if they are interested, we may have a lot of references.We will not overemphasize what technical solutions we use for users. We just communicate with users about what kind of product experience they have.
Q: If the end-to-end solution is to be officially pushed to users, what do you think is a good standard? When can it be officially pushed?
Jia Peng:I think the standard is still user experience. Why do we have 1,000 early bird users, instead of setting some takeover targets ourselves? If the experience of 1,000 users and 10,000 users is good, I think it can be promoted, or it can surpass the experience of the non-image version. Among the people who are currently participating in the early bird test, we have conducted some evaluations and found that its experience, stability, and security have all reached the standard.
Q: Will this gradual approach turn to One Model? Is One Model the only correct direction?
Jia Peng:From our perspective, One Model is end-to-end, and other terms are not end-to-end. But if someone wants to add it, it’s fine. In fact, our Wutu is segmented. At that time, we called it the perception model and the predictive planning model, and that’s how we called it. But if you want to call it segmented, that’s fine too.
The meaning of end-to-end is not just one model or two models. Its greatest significance is that it fundamentally changes the entire R&D process. Only with a groundbreaking AI process can you truly make your system have the same driving capabilities as humans.
Before it was just called a function. I had a function of passing a toll booth via a ramp, but now I have the driving ability of an experienced driver. You may be able to experience our end-to-end system later. Of course, I have driven this car a lot. I can say that the first version drove crookedly, but now it drives very well. I am often surprised at the performance and capabilities of this model.
When it was fed with 800,000 data points, it still couldn't pass the roundabout, but when it was fed with 1 million data points, it suddenly passed the roundabout one day. In fact, we didn't deliberately get any roundabout data for it, we just kept feeding it data. It's just like teaching a child, he has a class today, and another class tomorrow, and suddenly one day he comes to you and speaks a few words of English.
End-to-end is different from previous R&D. In previous product R&D, I knew that you would perform like this in the future because I designed you like this. The end-to-end model has its own growth and emergence capabilities, or you may only discover its capabilities, but you cannot design its capabilities. I think this is a very big difference.
Q: Did you encounter any major challenges during the end-to-end process?
Lang Xianpeng: Actually, there are many challenges. The most important one is that we have done some preliminary research work in advance. This is one of them.
Secondly, from Ideal to our team, the company has a consistent and in-depth understanding of intelligent driving and artificial intelligence. The biggest challenge is whether everyone has a consistent understanding of this matter, whether some people think it is radical, some think it is conservative, or some think the solution is reliable or not.
I actually spent a long time explaining how we went from NPN to graphless and then to end-to-end step by step. This is the process of discovering and solving problems. After cognitive alignment, decision-making is very fast. Ideal Auto's strong execution ability is the result of our training and accumulation over the past few years.
In terms of organization and efficiency, our construction of the data-driven tool chain or the infrastructure of this system over the past five years has been very critical. Even if we now have people, computing power, and data, if you do not have a complete and efficient tool chain, you cannot operate efficiently. I must use an automated data closed-loop capability infrastructure to collect data, label samples, and automatically label and train, then automatically evaluate, and then iterate. We have been iterating these contents from the first car in 2019 to the present, so the ideal data closed-loop infrastructure capabilities are absolutely top-notch in the industry.
Q: You once mentioned that Ideal's intelligent driving experience lags behind Tesla by half a year. How did you come to this conclusion?
Lang Xianpeng:Starting from Tesla FSD V12.3, we actually go to the United States for tests regularly. We have tried it on both the West Coast and the East Coast. This is our own summary. In fact, Tesla is really great on the West Coast of the United States at present, because it now has the most data in California. But when you get to Boston and New York, you will find that its performance will drop sharply, especially after arriving in New York, it has basically reached MPI (Mileage Per Intervention) of about 10 or 11. In fact, there is no generation gap between the level of New York's takeover and the top performance in China. But even the road conditions in New York are much less complicated than those in Shanghai and Guangzhou, China, so this is why we dare to draw this conclusion or dare to say so.
On the other hand, Tesla (in the United States) can obtain a lot of information that is not available in China, such as map information. In fact, Google provides a lot of road structures. Domestic navigation maps will not give you this information. Tesla actually achieves this experience on a very good basis.That's why we said that if FSD comes to China, it should actually be tested in Shanghai now. I think it will require a lot of work, including maps, because it cannot get so much rich information on the maps, and it will have to make a lot of modifications, so we made this judgment.
Q: Ideal’s goal this year is to become the absolute leader in the intelligent driving field. What dimensions will be used to define it?
Lang Xianpeng:I think it all comes down to quantity. Is our AD Max sales volume leading in the market this year? This is actually the most hardcore indicator. I only look at Max cars, not the total volume. If I sold 50,000 cars this month, but AD Max only sold 10,000, it means that my AD Max failed. But if I succeed, Max will account for a high percentage.
In the one month since we launched 6.0, we have seen an increase in the number of car owners visiting our stores, and sales have also increased. The proportion of our users' orders for AD MAX has increased from 37% in May to 49%, and 75% of the orders for L9 models have been for AD MAX. I think it is the most convincing thing that users actually pay for your products.
Internally, we also reflected on the strategy meeting in March this year, that is, we should not pay too much attention to competition. Why did everyone complain about the mediocre quality of our first version of Wutu in the first half of this year? In fact, the problem was that we paid too much attention to competition. At that time, we regarded Huawei as a very good competitive benchmark, so its takeover rate and product indicators became our target. In fact, looking at these indicators alone, our version was not bad, but the user experience was not good, so in the end we changed to judging by user experience and evaluation instead of just looking at indicators. However, indicators are a reference, and we need to look at them.
Q: After Tesla FSD is implemented and solves some problems with China's road conditions, some leading car companies may all stand on the same starting line. What will everyone compete for then?
Lang Xianpeng:This is also related to some of our subsequent plans. From end to end, everyone is really using artificial intelligence to do autonomous driving. I believe that sooner or later, the top players will definitely do this. Once they enter this direction, the gap between everyone will definitely be widened. It is not like now that you can do assisted driving with a 7,000 yuan car, right? You can do it with one Orin, two Orins, or four Orins. But when you really enter the era of artificial intelligence, everyone is actually competing on two things.
First, do you have enough high-quality data? Second, do you have a cluster with sufficient training computing power to match it? So in the end, everyone is competing on computing power and data, but the thresholds for these two things are very high. If your company's financial reserves are not enough to support your annual training expenses, you will not be able to play to the later L3 or L4. If your car company does not own a large number of cars of this type, your data will not be able to support your training needs.
We have made a preliminary estimate that Ideal currently invests 1 billion RMB in training each year, and we estimate that the cost will reach 1 billion USD per year in the future. This is only the computing power for training and does not include other personnel and various expenses. So if you cannot spend 1 billion USD a year on training, you may be eliminated in the future competition for autonomous driving.
Question: 1 billion US dollars a year, how do you deduce this?
Lang Xianpeng:From the perspective of model parameters, it is the most direct. Taking Tesla as an example, FSD V12.3-12.5 expands the model by 5 times, and the computing power also expands by 5 times. Our current end-to-end model has about 300 to 400 million parameters, and VLM has 2.2 billion parameters. By the Thor generation, its computing power has increased a lot. The model cannot remain unchanged. In order to increase the upper limit to L3 and L4, the training computing power will naturally have to be increased exponentially. I think this is the logic.
Q: Are domestic manufacturers now on the same starting line in terms of the end-to-end path?
Lang Xianpeng: Domestic manufacturers are on the same starting line from end to end, but I think that if we look at the One Model, Ideal may be a little ahead.Based on the One Model, we first released our own bird egg version, and it was a relatively large-scale release and delivery to thousands of people. In addition, everyone has personally experienced the performance and experience improvements brought by this end-to-end version compared to the previous one without a picture during use. My judgment just now was based on this.
Q: Regarding the issues of computing power and purchasing cards, does the company support it?
Lang Xianpeng:Our company is also very supportive. Now Li Xiang comes to ask every now and then, "Langbo, do you have enough cards?" If you don't, ask someone to help you solve it. I said yes, thank you. Although we have done a very extreme job in all aspects of management, I think Li Xiang has a relatively thorough understanding of artificial intelligence. So, we are not particularly worried about computing power, etc. When Brother Xiang approves the budget, I think he will consider these.
Q: You mentioned that not every car company is capable of autonomous driving. In terms of computing power, how much reserve is needed to meet the entry ticket standards?
Lang Xianpeng:Now, in our ideal practice, we must have a computing power expenditure of 1 billion RMB per year. If you don't have it, either the iteration speed will be slow or the product competitiveness will be insufficient. In the future, we think that 1 billion US dollars per year may be a necessary computing power investment. We have roughly estimated it ourselves. We now have about 15,000 cards, which is quite tight now. I coordinate how to allocate cards every day, but as the number of model parameters increases, I think it will be at least 3-4 times (investment), which is more reasonable. Because the computing power itself has been improved a lot, its bandwidth and storage have also been improved a lot. I think it is basically based on the feeling of less than 100,000 A100s, which may be about 3 billion Flops of computing power.
Question: Is this kind of investment endless, or is there an upper limit? Or will it stabilize at a certain point? How do you ensure the balance of commercialization?
Jia Peng:In the past two years, the number of model parameters has increased from tens of billions to trillions, or even 10 trillion. This is a very steep curve, but recently everyone is reflecting on whether the bigger the better. Now it has started to shrink a little. Maybe some large models in professional fields do not need so many parameters. As long as the data quality is good enough, the number of model parameters may not be so large. This is a HYPE curve. It may fall again after a while, but I think it will eventually reach a stable state. Whether it is the number of model parameters or computing power, there will be such a process. Everyone will climb up quickly at first, and then they may return to some in the end, and then they will be truly practical. This is the process.
Q: In the first half of the electrification competition, Tesla,BYDNow that we have left the competitors far behind, what will the second half of the smart driving competition look like?
Lang Xianpeng:The first half is electrification, and the second half will definitely be intelligence. In the future, everyone will definitely see some of our investments and performance in intelligence. End-to-end is just the beginning.