news

Ideal Auto CEO Lang Xianpeng: Without $1 billion in profits in the future, we can’t afford autonomous driving|36Kr exclusive interview

2024-08-06

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

Interview|Li Qin and Li Anqi

Text | Li Anqi

Editor | Li Qin

In early June, the day before his speech at the Chongqing Automotive Forum, Li Xiang, CEO of Ideal Auto, changed his speech at the last minute. The team had originally prepared a topic for him to talk about, artificial intelligence, but Li Xiang wanted to talk more about autonomous driving.

Li Xiang said at the meeting that in the future, autonomous driving will be like humans, with quick response capabilities and logical reasoning capabilities to handle complex events. The answer found by Ideal is: end-to-end + VLM visual language model - this is also the hottest topic in the current intelligent driving industry.

A month later, Ideal Auto's intelligent driving team released the "end-to-end + VLM" solution in detail. Different from the "segmented end-to-end" solution of domestic peers, Ideal's solution is closer to Tesla and is called "One Model", a big network.

In the outside world's impression, Ideal's intelligent driving has always been in a state of catching up. In the fierce city-opening war in the industry last year, Ideal began to frequently change its route in order to catch up with the industry's pace: from relying on high-precision maps, to light maps (NPN feature networks), and then to removing high-precision maps.

Lang Xianpeng, vice president of intelligent driving R&D at Ideal Auto, and Jia Peng, head of intelligent driving technology R&D, were recently interviewed by 36Kr. Looking back on this journey of catching up, Lang Xianpeng concluded, "The core principle is whether you can find the essence of the problem, and then make up your mind and correct it quickly."

Choosing the "end-to-end" technology route is also a continuation of this principle. Lang Xianpeng said that in the past, whether it was map-less or map-free, the underlying technical architecture was "map-based". According to the existing "perception to regulation and control process", if the upstream perception information was damaged, the downstream regulation and control would have to constantly patch up loopholes, "which requires a lot of manpower and resources."

Of course, resource investment is still a secondary issue. The core problem is that "the rule-based intelligent driving experience has a limit and can never be humanized."

"End-to-end + VLM + world model" is the best artificial intelligence implementation paradigm that Ideal has found.

In simple terms, the ideal end-to-end solution eliminates the original intelligent driving system's multiple independent modules that rely on artificial rules, such as perception, prediction, planning and control, and merges them into a large neural network. "Input sensor data, output planning trajectory." Lang Xianpeng concluded.

The VLM visual language model provides an end-to-end plug-in similar to ChatGPT. The end-to-end problem is "what kind of data it is given, what kind of behavior it will have", and the VLM visual language model has the ability to understand the world and logical reasoning. In complex scenarios, the end-to-end can ask questions to the VLM in real time, and the latter will give relevant driving suggestions.

The world model is a huge book of wrong questions. We can generate simulated data through reconstruction + production, and add real cases accumulated by Ideal to form "real questions + simulated questions" to test the end-to-end model. Only after the model passes the test and gets a high score can it be pushed to users.

In Ideal, these three models are called System 1, System 2, and System 3. System 1 corresponds to the immediate thinking mode in the human brain, System 2 corresponds to the logical thinking in the human brain, and System 3 is an examination model responsible for accepting the training and learning results of System 1 and System 2.

End-to-end intelligent driving technology was initiated by Tesla. In August 2023, Musk demonstrated the end-to-end FSD v12 version capabilities in a live broadcast. Currently, FSD has been upgraded to v12.5. But unlike Tesla, in addition to the end-to-end and world models, Ideal also introduced the VLM large language model capabilities.

Jia Peng explained to 36Kr that he spent a week on the East Coast and West Coast of the United States testing Tesla's FSD and found that even "end-to-end" has an upper limit. On the East Coast of the United States with complex road conditions, such as New York and Boston, Tesla's takeover rate has increased significantly. "The number of parameters of the end-to-end model that can run on HW3.0 is not particularly large, and the model capacity also has a natural upper limit."

The ideal role of VLM is to raise the upper limit of "end-to-end". It can learn about potholes, schools, and be responsible for events such as construction and roundabouts, and provide decisions to the end-to-end system at critical moments.

Lang Xianpeng and Jia Peng both believe that VLM is the biggest variable of Ideal's intelligent driving system. Because VLM's parameters have reached 2.2 billion and its response time is 300 milliseconds, if there is a chip with greater computing power, the number of parameters that can be deployed by VLM will reach tens of billions, which is the best path to advanced autonomous driving L3/L4.

"VLM itself is also following the development of large language model technology. No one has yet answered how large the number of parameters will eventually be," said Jia Peng.

It is not difficult to find that the characteristics of data-driven and large visual language models have determined that the intelligent driving industry has participated in the computing power game initiated by OpenAI, Microsoft, Tesla and other companies.

Lang Xianpeng did not hide the fact that at this stage, what everyone is competing for is the quantity and quality of data, as well as computing power reserves. High-quality data is based on an absolute data scale; to support the training of the L4 model, it probably requires dozens of EFLOPS of computing power.

"Any company without a net profit of 1 billion US dollars cannot afford to play a role in the future of autonomous driving," Lang Xianpeng said bluntly.

At present, Ideal Auto's cloud computing power is 4.5EFLOPS, which has quickly narrowed the gap with the leading company Huawei. According to 36Kr Auto, Ideal recently bought a large number of Nvidia cloud chips, "basically buying all the cards that the channel dealers have."

CEO Li Xiang also has insight into the trend of this competition: use resources and intelligent technology to leverage and leave the competition behind. He often asks Lang Xianpeng, "Are the computing resources sufficient? If not, ask Xie Yan (CTO of Ideal) to get some more."

"We have cars and more money than others, so we have a great opportunity to widen the gap with them on this road," said Li Xiang. According to the financial report, as of the first quarter of this year, Ideal Auto's cash reserves were close to 99 billion yuan.

Ideal has seen from internal data that the commercial closed loop of intelligent driving has begun to show signs. In early July, Ideal began to deliver the 6.0 intelligent driving version that can be driven throughout the country to users of the intelligent driving Max version. Lang Xianpeng found that the proportion of Ideal Max models quickly exceeded 50%, "with an increase of more than 10% every month. If 2%-3% can be understood as normal jitter, but more than 10% is effective growth."

Lang Xianpeng also knows that although the vision of L4 autonomous driving has become clear, the path to its realization has not changed. "We have to help the company sell cars quickly. Only by selling cars can we have money to buy cards and train intelligent driving."

If intelligent driving is the key to success in the future automotive battlefield, it is obviously a more brutal resource game. Ideal has made advance preparations from top-level strategy to technical preparation and resource investment. What about others?

The following is an edited conversation between 36Kr Auto and Ideal Intelligent Driving R&D Vice President Lang Xianpeng and Ideal Intelligent Driving Technology R&D Director Jia Peng:

Talking about the upper limit of intelligent driving: Whether with or without a map, it is a homogeneous architecture

36Kr Auto: Have you reviewed the situation internally? How did you quickly improve your intelligent driving capabilities from a backward state to a level comparable to Huawei's?

Lang Xianpeng:In fact, compared with Xiaopeng, Weilai, and Huawei, it’s not that we have more brains. We may even have fewer people than them, but we insist on seeking truth from facts. Sometimes I think people may not find the essence of the problem. When encountering difficulties, they just think about whether they can change what they are doing now and make iterations.

For example, from having a picture to not having a picture, the picture itself is the biggest problem. I have done a lot of work on the picture before, and I would like to struggle a little longer, but in fact, I should quickly invest in the next stage of research and development. It depends on whether we can find the essential problem and make up our minds to correct it quickly.

36Kr Auto: Ideal has achieved nationwide map-free intelligent driving, and there are many versions. How do you correct the deviations?

Lang Xianpeng:At the Shanghai Auto Show last year, we started working on city NOA. The ideas of each company are similar. High-precision maps are used for highways, so we first look at whether the high-speed solution can be used in cities. We have to ask the map vendors. AutoNavi said it has high-precision city maps, but only for about 20 cities. We said we would give it a try first.

But the iterative updates of the solution and the map are tied together. When we were working in Wangjing, we had to wait for AutoNavi to iterate the map before we could continue working if we had to repair roads, change lanes, or even change traffic lights. Around June last year, we decided not to re-map the map and instead use the NPN (a neural prior network) solution. This is equivalent to local mapping, using NPN prior information at major intersections and roundabouts, and our cars update the features.

But it’s ok to have many cars in big cities like Beijing, Shanghai, Guangzhou and Shenzhen, but how to update in small cities with few cars? Should we always do it in big cities? Users won’t buy it. At that time, the team was still hesitant. Beijing, Shanghai and Guangzhou were doing OK. There were also voices inside that why not do it in a few first-tier cities instead of 100 cities? Anyway, Huawei only had 50 cities at the beginning, so we don’t have to be the first or second.

I said that was not possible, we still had to do it quickly, and I still wanted to know, if it was really done on a larger scale, would the NPN method be OK? The problem was that maps always had limitations, and there were also criticisms that some cities could only have two roads. So after learning from the pain, we started to switch to the no-map solution after delivering 100 cities in December last year.

36Kr Auto: From NOA to end-to-end, what is the necessity?

Lang Xianpeng:There are still problems without maps. The original map may provide some relatively accurate information. After removing the map prior information, the requirements for upstream perception become particularly high. In the downstream regulation and control, the information input was very regular before, but now there are some jitter problems and errors in perception, which also poses great challenges.

To continue doing this, a lot of manpower is needed. For example, if there is a problem with perception, a lot of rules have to be added to the intermediate environment model. If the subsequent regulation and control have an impact, rules will be added to compensate. This is a big challenge to the team's human resources. This is how Huawei Wutu came about (human resource advantage). We also wanted to hire more people in the second half of last year.

But the upper limit of this thing is quite obvious, mainly because all the rules are made by people and designed by engineers. Especially in the later period of January and February this year, we often changed a rule, and if this case is fine, other cases will not work. The mutual involvement is too great and endless.

Of course, the investment of resources is secondary. The most important thing is that the rule-based experience has a limit and can never be human-like. So we iterated to the current end-to-end and VLM. End-to-end is the first time that artificial intelligence is used for intelligent driving.

36Kr Auto: When did Ideal start investing in end-to-end?

Lang Xianpeng:We always have two lines in work. One obvious line is mass production and delivery. Last year, NPN light-to-picture to no-picture was the obvious line. The end-to-end is a hidden line, which is our pre-research line.

It was only made clear at the Yanqi Lake Strategy Conference last year. At the strategy conference, Li Xiang mentioned that autonomous driving is our core strategy and that RD (technology research and development) must reach important milestones. The end-to-end idea has been in place for a long time, but there has always been pressure to deliver and no resources to explore.

36Kr Auto: It will not be long before the image-free version is launched and it will be launched end-to-end. How is this pace considered?

Lang Xianpeng:At the beginning of the year, I told Li Xiang that although we want to do end-to-end, we still need to do mapless because mapless is the support for end-to-end. If we don’t do mapless, where will we get the data and experience to switch to end-to-end?

And we must first use Wutu to sell the car, otherwise how can we compete with Huawei? Now that Wutu is used, it is to buy time for end-to-end, and at the same time improve the product strength to a certain extent to help sell the car.

36Kr Auto: You have been correcting yourself all the way and have been denying your own plans. Do you feel pressure from the perspective of upward management?

Lang Xianpeng:No, first, my responsibility is to lead everyone to achieve autonomous driving; second, an ideal organization has its own methodology or process, such as doing the right but difficult thing. It sounds like nonsense, but it is crucial.

Li Xiang would never say, "How did Lang Bo negate what we did before?" We explained to him clearly why we wanted to do this, that we wanted to win in AI strategy, and that we found a dual-system paradigm, and he immediately understood. He would only say, "End-to-end is great, we have to do it quickly."

What artificial intelligence needs is computing power and data. Li Xiang often came to ask me, Langbo, do you have enough computing power? If not, ask Xie Yan to get you some.

Li Xiang said, "We have cars and more money than others, so we have a great opportunity to widen the gap with others on this road. So don't do patchwork, and hurry up to work on the AI ​​behind it."

Talking about the future of intelligent driving: End-to-end + VLM is the best paradigm for artificial intelligence

36Kr: Some companies that have never done map-free work believe that end-to-end is an opportunity to overtake others. Is this true?

Lang Xianpeng:You are half right. End-to-end lane change is indeed possible. Regardless of whether there is a map, NPN, or no map, the core of the solution is the same. Remove the map, enhance perception, stack small modules into several large models, and evolve little by little using the same solution.

But end-to-end is different. It is the first time that artificial intelligence is used to do autonomous driving. When One Model is used for end-to-end, the input is only data, the output is the trajectory, and the intermediate modules are integrated into one model.

The entire R&D process system is completely different. In the traditional product development model, the driving force comes from demand design or problem feedback. This scenario does not work. After a bug occurs, some manual design iteration and verification are required.

End-to-end is a black box. Its capabilities depend entirely on the data it is given. We are now screening the data of experienced drivers. If the data is not good, the model that comes out will not be good. What goes in is garbage, and what comes out is also garbage. It is a training process of data flow. It used to be a product function development process, and now it is a capability improvement process.

So changing lanes through end-to-end is no problem, but if you want to overtake, you need data and training computing power. If you don’t have these two prerequisites, to be honest, everyone has a model, and the model itself will not be much different. Even the best model, without data and computing power, is just a bunch of parameters.

36Kr: Ideal has accumulated a lot of data, but He Xiaopeng recently pointed out that having more data does not mean that autonomous driving can be achieved. What do you think?

Lang Xianpeng:Our training data is clips, which includes complete data of the driver driving for dozens of seconds, including visual sensors, vehicle status information at the time, and accelerator and brake operation data.

But data must be high quality to be useful. What does high quality mean? We have jointly defined a standard called "high-quality human driver" with the product and vehicle subjective performance evaluation team. Some drivers drive every day and have very high skills. If they always accelerate or decelerate suddenly, always use AEB or turn the steering wheel suddenly, they may not be able to drive well.

According to these standards, only 3% of our 800,000 car owners are "high-quality human drivers". Together with the high-quality data accumulated previously, millions of clips are finally formed, which are all the essence. He Xiaopeng is right. High-quality data is indeed needed, but data quality is based on the absolute scale of data.

36Kr Auto: After end-to-end, does the data tool system need to be upgraded?

Lang Xianpeng:The tool chain has changed a lot. Previously, the product function development process was a process where users took over, data was sent back, problems were analyzed manually, and then the code was modified, the vehicle was tested, and the product was released online. This process of closed-loop data was very efficient. But it also took several days and involved a large number of people. The more tests were done, the more problems there were, and the more people needed to make changes.

The current process is that if a car owner takes over, after the data comes back, similar scenes will be automatically generated using the world model to become a wrong question bank. We also need to check whether there is similar data in the wrong question bank. If not, we will dig into the existing database for joint training.

After the new model is trained, it is returned to the world model test system for two tests. The first test is to see if the wrong questions were answered correctly, and the second test is a set of real questions to test ability. If there are no problems in both tests, the model is ready. In the most extreme case, there is no human in the middle, and it is a very automated closed-loop process.

36Kr Auto: End-to-end is a black box training process, and a lot of code needs to be added to provide a backup. Can you tell how much work is required to provide a backup?

Lang Xianpeng:Very few. The version with graph has about 2 million lines of code, and the version without graph has 1.2 million lines of code. The end-to-end total is only 200,000, which is only 10% of the original.

We do use some backup rules for control, because the end-to-end input sensor data and the actual output planning trajectory may have problems, so we have some brute force rules to avoid abnormal control behaviors, such as turning the steering wheel 180 degrees.

36Kr Auto: Musk said that 300,000 lines of code were deleted. You seem to be more radical. If there are more and more problems after the push, will the code be added back?

Lang Xianpeng:I think it will not change much. The main reason is that we have the ability to continuously iterate.

36Kr Auto: Ideal has always had two lines: mass production and pre-research. From pre-research to mass production end-to-end, what is the current pre-research?

Lang Xianpeng:L4. This goes back to our understanding of artificial intelligence. We found that if we want to achieve true autonomous driving, it will be quite different from what we do now.

End-to-end means that it will behave in the same way as the data it is given. If it has never been given similar data, it will not respond. But people are different. For example, if I drive in Beijing, I can also drive in the United States. If we really move to autonomous driving, the system must also understand things like humans and have the ability to reason.

We studied how the human brain works and thinks. In August and September last year, Jia Peng and Zhan Kun saw the dual-system theory, which is a very good framework for human thinking. Assuming that artificial intelligence is a dual-system, System 1 has the ability to respond quickly, and System 2 is the ability to think logically, which can handle unknown things well.

These are all theoretical aspects. When it comes to autonomous driving, the end-to-end model is System 1, and System 2 is the VLM visual language model. This is the best solution for implementing artificial intelligence in the physical world.

So how do we measure the capabilities of System 1 and System 2? We also have a world model, which is actually called System 3 internally. We have a very clear usage of the world model, which is to test System 1 and System 2. It is an examiner.

We have a real test bank, which is the real data of normal driving. The world model is a generative model that can generate other questions based on existing data. When a model is trained, we do the real test once, and then do a few sets of simulation test questions to see how many points we get. Each model will have a score, and the higher the score, the more powerful the model is.

36Kr Auto: Under what circumstances will System 2 be triggered?

Lang Xianpeng:System 1 and System 2 are always working. If some systems are more complex, System 1 may not be able to identify them well, such as overpasses, puddles, and newly constructed concrete floors. System 2 will work in these scenarios, but it will just work at a lower frequency, such as 3-4 Hz, while System 1 may run at a high frequency of more than 10 Hz. Similar to GPT, System 1 will always ask System 2 questions about what to do when encountering such a scenario.

36Kr Auto: Does System 2 VLM itself have capability limits?

Lang Xianpeng:You can think of it as a large language model. Some large language models may be good at mathematics, some may be good at coding, and have different capabilities. We focus on giving it driving-related regulations, teaching videos and textbooks for subjects 1-4. Our VLM is actually a large language model that is biased towards driving.

In the short term, it does not have some knowledge, but as the closed loop runs faster and faster, its capacity limit will get higher and higher. The end-to-end parameters are only more than 300 million, and the VLM system parameters are 2.2 billion.

36Kr Auto: So the bigger variable in the future of intelligent driving will be System 2?

Lang Xianpeng:The underlying support is System 1, but as we move forward, including L3 and L4 autonomous driving, we must have very powerful System 2 capabilities. The current 2.2 billion parameters may not be enough, and we have to increase them.

Jia Peng:System 2 is mainly focused on complex scenarios. The response time for 2.2 billion parameters is 300 milliseconds. This inference time is OK in difficult scenarios. But System 1 is definitely not enough and requires tens of milliseconds.

36Kr Auto: Is there an upper limit on model parameters? For example, 8 billion? What is the approximate requirement for chip computing power?

Jia Peng:Just like the large language model, no one can answer how large the parameters should be.

Lang Xianpeng:We now have both the theory and the technique. System 1 plus System 2 is a very good artificial intelligence paradigm, but we still need to explore how to implement it specifically.

36Kr Auto: If the segmented end-to-end model is to evolve into One Model, does it have to be rebuilt from scratch?

Jia Peng:The challenge is quite big. Our imageless model is equivalent to a segmented model, with two models. But first, the technical challenge is relatively large, because the traditional system is gone. How can we train the model to achieve good results? The second challenge is the human challenge. How can people with different backgrounds in perception and regulation work together to create a model?

Our team also struggled a lot. When it comes to end-to-end, the roles of many people may change. People who used to work in engineering may define data and scenarios. Changing one's role is still quite challenging.

Talking about the business closed loop: Without 1 billion US dollars, you can’t afford autonomous driving

36Kr Auto: It sounds like money is being burned. How much do you plan to invest in end-to-end?

Lang Xianpeng:Of course, it is 1 billion RMB at present, and in the future, the training of the autonomous driving model may require 1 billion USD, not including other expenses, such as buying cards, electricity, and talents. Companies without a net profit of 1 billion USD cannot afford it.

36Kr Auto: End-to-end may be a watershed technology in the automotive industry. From the perspective of a closed business loop, how is the commercial performance of intelligent driving?

Lang Xianpeng:From version 6.0, in the past 1-2 months, our AD Max accounted for more than 50%, with an increase of more than 10% every month. If 2%-3% can be understood as normal jitter, but more than 10% is effective growth. In Beijing, Shanghai, Guangzhou and Shenzhen, the proportion of our intelligent driving models has reached 70%. AD MAX orders for L9 models reached 75%, L8 was 55%, and L7 was 65%.

Jia Peng:L6 also has 22%. For young people buying cars, intelligent driving has become a very important factor. After using intelligent driving, it is difficult to go back to the original state.

Lang Xianpeng:Now, highway NOA is well recognized by everyone, but urban NOA is still in a very early stage. To a large extent, the urban product strength is not good enough, even without map, it has reached the ceiling, and it is not particularly good compared with the comfort of human driving. After end-to-end, everything will change, and some performances are still quite close to humans.

With the addition of data and computing power, the urban intelligent driving derived from the end-to-end architecture is likely to achieve the driving experience on the highway. At this stage, it will be of great help to users in purchasing cars.

36Kr Auto: The commercial value of intelligent driving is becoming increasingly obvious, but Ideal’s intelligent driving function has always been free. Will you re-discuss the strategy to make the commercial value more prominent?

Lang Xianpeng:Many people buy Ideal for refrigerators, TVs and sofas, but in the future they may also buy Ideal for intelligent driving, which is enough to show the commercial value of intelligent driving. The difference between the Max and Pro versions is really 30,000 yuan.

As for software charges, if it reaches the L4 level, it is really amazing. Imagine that it can help users pick up their children from school. Are you willing to pay for this service? As the capabilities improve, some additional business models will emerge, but the premise must be that the intelligent driving capabilities are greatly improved.

36Kr Auto: Xiaopeng mentioned that it plans to achieve an experience similar to Google Waymo’s in the next 18 months. Do you have such a timetable?

Lang Xianpeng:If the data and business can support the goal, it is possible. We have calculated internally that, not to mention L3L4, to support VLM and end-to-end training, it will probably require dozens of EFLOPS of cloud computing power.

Xiaopeng's is 2.51 EFLOPS, and the ideal is 4.5 EFLOPS. At least 10 EFLOPS of computing power is needed to achieve this, which is about 1 billion US dollars, or 6 billion RMB, per year. If you can afford it every year, you can try it.

36Kr Auto: In addition to computing power, based on the current technical architecture, how much investment will the intelligent driving team need on average per year?

Lang Xianpeng:The major expenses are training chips, data storage and traffic, which cost at least 1-2 billion US dollars a year. But going forward, especially with the world model, the ultimate goal is to restore the entire real physical world. This also requires training and a lot of computing resources.

As for what the upper limit is, I can’t imagine it now, but it must be at least more than 10 EFLOPS. Musk said it would be hundreds of EFLOPS, and we don’t think he is talking nonsense.

36Kr Auto: Car companies are still following the profit model of the manufacturing industry. There will be a price war this year, and profits will be affected. Is it more appropriate for car companies to do what technology companies do?

Lang Xianpeng:Whoever can get high-quality data and have enough computing power for training can build a good large model. We may not need that many talents, but we must have the corresponding talents. Who else has all three of these, besides Ideal, Huawei, and Tesla? I can't think of any.

Our current idea is to help the company sell cars as quickly as possible. Only by selling the cars can we have money to buy cards and train in smart driving.

The further we advance in smart driving, the bigger the gap will be. Before, we had maps or not, and everyone was working on something that could reach the ceiling. To make further breakthroughs, we have to add AI, and what everyone is competing for is data and computing power. If we can't solve it, we can only roll in the previous dimension, and we will cross over to the next dimension to reap the data dividend.

36Kr Auto: With intelligent driving technology changing so rapidly and requiring such huge investments, how can Li Xiang stay aware of intelligent driving?

Lang Xianpeng:He would talk to me and Mr. Jia at any time. Since last September, we have had a weekly meeting on artificial intelligence, which brings together all the people in the company who are related to AI, including those in smart space, infrastructure, and training platforms. Li Xiang's understanding of artificial intelligence is still very good.

He also has some other resources and knows a lot of people. He has talked with Lu Qi, Kimi CEO Yang Zhilin, Horizon Robotics CEO Yu Kai, etc. He not only understands the core essence and actual technology of AI, but can also express it in some relatively popular words.

36Kr Auto: How much manpower is needed for end-to-end model design? What is the average size of a future intelligent driving team?

Jia Peng:Maybe not too many are needed. In fact, Tesla has very few elite people who make models, and the visual team has only 20 people. This can actually be reversed. For example, with the OrinX chip, the model itself runs at 12-15 Hz, which basically determines how large the model parameters are and what kind of model structure to use for training. Maybe a few people can roughly define it.

Lang Xianpeng:Tesla is more extreme, with a software algorithm team of more than 200 people, but it only makes one chip and a few car models. We can't be as extreme as they are now, but we can still do several times more than them. Because our chip platform is different and we have many car models, although we don't employ a lot of people, we still need to have some people in each place.

36Kr Auto: Cloud computing power will be a huge investment in the future. Have you considered replacing domestic chips? Will it be difficult to switch?

Jia Peng:The car side has already used Horizon's J3 and J5. The cloud side is trying some domestic products, but the biggest difficulty at present is that their ecosystem is not that good. NVIDIA's CUDA ecosystem is too invincible, and it is very troublesome to adapt to another ecosystem. Now we still want to put efficiency first, and at the same time pay attention to the progress in China. We have already started exchanges and trials.

36Kr Auto: After the independently developed intelligent driving chip comes out, what effect will it have when combined with end-to-end systems?

Jia Peng:The combination of software and hardware will definitely produce better results. Tesla has already produced a sample. Their chips are cheaper, their computing power is higher, and their support for AD is better. They wanted to increase the parameters by 5 times in FSD V12.5. This is indeed a great advantage.

Lang Xianpeng:The prerequisite is that the L3 and L4 algorithms must be determined.

36Kr Auto: Will there be a time point for L4 autonomous driving?

Lang Xianpeng:It will take about 3-5 years. We first completed L3, which is the stepping stone to L4. First, it will allow us to better understand the computing power and data requirements of L4, including the basic capabilities of the examination system and data closed loop.

Second, from the product perspective, we need to build a trusting relationship with people. Because the end-to-end system is still a black box, people don’t trust the system to some extent. So through L3 products, we can build a good trusting relationship with people.

36Kr Auto: Many AI technologies originated in Silicon Valley. Ideal used to follow Tesla, and is now also doing cutting-edge exploration. How can you ensure that your judgment or sense of technology is accurate and sharp, and not click on the wrong technology tree?

Lang Xianpeng:We already have a complete system. L4 will take another 3 to 5 years but we have already started to explore it. If we make a mistake, it’s better to make it early and there is still a chance.

There is indeed a split in artificial intelligence between China and the United States. China actually has a lot of talent. We try our best to find the best young people. For example, this year we recruited more than 240 campus students, all of whom were in the top 100 of QS (the top 100 university rankings in the world).

Talking about Tesla: Learn from Tesla and Surpass Tesla

36Kr Auto: Some people say that the gap between China and Tesla’s intelligent driving is 2 years. What do you think?

Lang Xianpeng:Definitely not. We won’t comment on the technical solution, because Tesla hasn’t said much about its technical solution in the past two years. From the product experience point of view, we are basically at the level of the end-to-end version just released by Tesla last year. The gap is about half a year.

36Kr Auto: Tesla has also encountered some problems. Musk said that the data and feedback have decreased. How do you avoid them?

Lang Xianpeng:These are different stages. When we encounter them, it means we have entered the next stage.

Jia Peng:Tesla’s biggest problem right now is verification. You can see that v12.4 (Tesla FSD version number) did not work well, and then v12.5 was released, with the number of parameters increased by 5 times. I guess the verification step was not done very well. When the model came out, I didn’t know how it would perform when it really reached users.

This is why we emphasize the world model. We have learned these lessons and must do the verification in advance. Otherwise, how can the model be verified for all roads in the country, including in parks and communities?

If you look at Tesla's 2022 AI Day, it is still very traditional simulation, and the scalability is too poor to support full opening in North America. In this regard, we have indeed learned some lessons from Tesla. That's why we put so much effort into making a world model.

36Kr Auto: In the process of building the end-to-end solution, are there any parts that you find difficult? For example, the data tool chain?

Jia Peng:The data set was built in 2019, and it is the best in China at least. Data and training are actually routine, and there are paradigms to follow. At present, verification is the biggest challenge.

The other is VLM itself, which gradually plays a greater role. Maybe at the beginning, it is only used in 5% of cases, but later it may reach the upper limit of end-to-end, and the rest of the product experience will rely on VLM to iterate, which is a future challenge.

This is also different from Tesla. We developed VLM and world model because we saw Tesla's problems. There was a problem with v12.4 verification. We drove it twice in North America, each time for about a week, and drove it on both the West Coast and the East Coast. It was obviously found that it was very good on the West Coast and very bad on the East Coast. Boston and New York were not very good because these two cities are much more complicated than the West Coast.

On the East Coast, Tesla's average takeover rate is quite high, and the end-to-end ceiling may be here. So when we do VLM, we want to break this ceiling. The upper limit of VLM is very high, and it is possible to surpass it (Tesla) through this path.