news

spending 1 billion a year is just the beginning. can the "mythical" end-to-end and "best practices" in china's autonomous driving circle make money?

2024-09-17

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

author: hua wei

it is expected that in less than half a year,tesla fsd will officially enter china. on september 5, tesla announced that fsd will be launched in china and europe in the first quarter of 2025.

not long ago, the end-to-end tesla fsd v12 version received a lot of praise from both inside and outside the industry after it was released.xpeng motorschairman he xiaopeng has published an article to comment that tesla's autonomous driving "performs very well" and excitedly stated that "2025 will be the chatgpt moment for fully autonomous driving!"

the large models represented by gpt are profoundly influencing the research and development model of solutions in the field of autonomous driving with their unprecedented innovation speed and technical architecture, and the global industry landscape is rapidly responding to this trend. judging from the current focus of domestic automakers, end-to-end has also become their new generation of autonomous driving technology route.

passenger car autonomous driving companies such as huawei, xpeng, pony.ai, momenta, jijia technology, and horizon robotics are actively following up and have launched end-to-end autonomous driving solutions and models for mass production. in terms of commercial vehicles, zero one motors has also announced a clear time plan for the installation of end-to-end large models.ideal autofounder and ceo li xiang also publicly stated that ideal auto will achieve l4 autonomous driving within three years based on end-to-end and world models.

even the l4 autonomous driving market, which had previously suffered a "cold wave", has recovered due to the arrival of end-to-end technology. wayve, which has won $1 billion in financing based on this technology concept, is a good example. liu yudong, investment manager of chentao capital, said, "end-to-end has opened up a second growth curve for l4 commercialization."

tesla, which has made a leap forward in fsd capabilities with its end-to-end solution, also announced that it will launch the robotaxi model on october 10. he xiaopeng also publicly revealed that xiaopeng motors will launch robotaxi in 2026. however, the recent actions and expectations of automakers and autonomous driving manufacturers to achieve mass production of l4 through end-to-end solutions have attracted doubts from many autonomous driving practitioners: is end-to-end overly "mythical"?

why has end-to-end suddenly become the top player in the smart driving circle?

end-to-end is not something that has just emerged in the past two years. as early as 2017, many companies were exploring the possibility of this technical route. this year, "end-to-end" has become popular in the autonomous driving circle and is regarded as a killer technology by the industry. in addition to the innovation brought by large language models such as chatgpt, it is also inseparable from its own "charm".

"the birth of the end-to-end model is the only way for autonomous driving technology to move towards large-scale commercialization." pony.ai co-founder and chief technology officer lou tiancheng said that one of the biggest advantages of the end-to-end model is its generalization, which can increase the speed of commercialization of autonomous driving and accelerate the popularization of autonomous driving.

according to wang panqu, head of zero one auto's intelligent driving, compared with end-to-end, traditional non-end-to-end autonomous driving systems not only have poor generalization, but also when expanding to new scenarios, many previously used rule-based solutions will become invalid. the newly added code will make the system's maintainability worse, which will lead to an increase in marginal costs.

in addition, traditional autonomous driving systems have two disadvantages. the first is the complexity of the architecture. multi-module systems not only have higher development costs, but also have a lower performance ceiling due to the small amount of computing resources allocated to each module. communication between modules also brings many engineering optimization problems. the second is the high cost problem caused by the complexity of the architecture. each module needs to be developed, maintained, project managed, and integrated, which is why the team size of traditional autonomous driving companies is very large.

"in my opinion, end-to-end can solve these problems very well." wang panqu said that from the perspective of architecture, end-to-end has only one module, which can solve the problem of complex architecture very well, and also has the advantages of reducing costs and increasing efficiency. end-to-end based on data or even knowledge-driven generalization is very strong, and it is very likely to achieve mass production quickly, which can not only reduce the cost of l2 adaptation to various models to a very low level, but also help l4 reduce the time to adapt to different scenarios.

in addition, lou tiancheng pointed out that the biggest benefit of end-to-end is to prevent the loss of information between different modules and functions. mao jiming, vice president of engineering at jijia technology, also talked about this aspect and explained that the effective transmission of information between modules is involved. the interface of the upstream and downstream modules defines the upper limit of the transmitted information, but no matter how sophisticated the interface design is, there will be information loss. the end-to-end unified module solution does not have this information loss, which helps to improve the final algorithm effect.

at the same time, mao jiming also talked about many other advantages of the end-to-end architecture. the first is in terms of module error. since the end-to-end is under one module, there is no error amplification effect of multiple modules, and the upper limit of the overall intelligent driving algorithm can be maximized. secondly, in the multi-module architecture, each module has a separate r&d rhythm and optimization goals, and it is not always strictly aligned with the global optimization goals of the entire intelligent driving system, resulting in potential invalid optimization and waste of r&d resources; while the end-to-end architecture has only one module, and the optimization goals are clear and unified, which can effectively avoid this kind of internal consumption optimization process.

another point is that the components of a modular architecture are naturally prone to forming multiple rule-driven "domains", which brings a series of maintenance challenges and corner case resolution difficulties; and end-to-end, as a typical fully data-driven architecture, will prompt developers to more proactively consider and solve problems from a data-driven and model-driven thinking paradigm, thereby improving the cognitive level of the entire algorithm team.

"overall, end-to-end system development is more efficient and consumes less resources." liu yudong said that the end-to-end pure data-driven development paradigm will reduce a lot of the original heavy engineering resource investment and shift the focus of the company's resources to high talent density and data accumulation investment in data-driven aspects.

it is worth mentioning that the user value brought by end-to-end has also attracted much attention. liu yudong pointed out that first, in the processing of long-tail scenarios, the end-to-end system can cover more extreme scenarios than the original system, such as common sense processing capabilities. second, the behavior of the autonomous driving system is more humanized, and it can also establish stronger trust between consumers and the system. the end-to-end is more like a human driver in scenarios with strong game nature.

the upper limit is high and the lower limit is low. is the "end game" of autonomous driving not here yet?

although the end-to-end technical advantages are significant and a number of car companies and autonomous driving companies are actively following up on end-to-end applications, there are still different opinions in the industry about the so-called "end game mode".

wang panqu, a firm believer, said, "i believe end-to-end must be the ultimate form of autonomous driving, but end-to-end is only a technical framework. however, there are many options for specific implementation methods, and the industry has not yet reached a consensus."

rationalists like mao jiming pointed out that the end-to-end solution has the characteristics of "high upper limit but low lower limit". in layman's terms, if it is done well, it can achieve very good results, but if it is not done well, it will be worse than the traditional solution. in mao jiming's view, whether to choose an end-to-end solution depends on the specific application scenario. for l5 autonomous driving, end-to-end is the only solution; but for l2 and l3, end-to-end is just one of the feasible solutions. moreover, end-to-end needs to be combined with other technical solutions when applied.

"end-to-end provides a good technical path for the rapid and large-scale popularization of autonomous driving. whether it is the final outcome remains to be verified by time." lou tiancheng has a similar view. he believes that both l2 and l4 autonomous driving have been realized. however, the quality of realization and the scope of realization have different requirements and standards for technology.

for l2 autonomous driving, end-to-end technology is currently the better path; for l4 autonomous driving, end-to-end can help it quickly expand into new areas. however, l4 has higher safety requirements, which must be more than 10 times that of human drivers. therefore, in addition to using end-to-end, it is also necessary to combine driving intentions and application scenarios to integrate highly deterministic instructions, such as traffic regulations, driving preferences, etc.

liu yudong made a more cautious conclusion: "at present, end-to-end is the final outcome of autonomous driving in the foreseeable future, but there are various possibilities for longer-term technological evolution. just like we couldn't imagine the emergence of a technology like chatgpt three years ago, a new technical architecture may also appear in two or three years to subvert the current chatgpt."

100% end-to-end has not yet appeared, what is the "best practice"?

although it is not clear whether end-to-end is the final solution for autonomous driving, its application has obviously become a consensus solution in the intelligent driving industry. however, there are still many controversies in the industry about the choice of end-to-end autonomous driving technology path.

currently, zero one auto is moving along the end-to-end route based on a multimodal large language model. not only has it achieved results on some public data sets, but it also took second place among 143 international teams in the end-to-end autonomous driving track at the international challenge for autonomous driving organized by the shanghai artificial intelligence laboratory and cvpr this year, relying on its purely visual autonomous driving solution.

wang panqu believes that modular end-to-end is equivalent to an early exploration, which can be implemented more quickly. at present, academia and industry have relatively mature solutions. the end-to-end technology route based on multimodal large models has the potential to turn autonomous driving into a profitable business, and only a highly generalized base model can bring the knowledge injection and integration required in the field of autonomous driving.

in short, the strong generalization of the large model will bring performance advantages to the entire end-to-end system, and will also make it possible to achieve large-scale mass production of profitable high-level autonomous driving in the future. in addition, the two end-to-end technical routes based on multimodal large models and world models can be reused in the future.

liu yudong said that in principle, the one model is closer to the agi form in other fields, while the world model is currently mainly a tool for data generation. it will take more time to observe whether it can be used as an autonomous driving system. in the next two years, there will be two main end-to-end solutions: one is modular end-to-end, with uniad from shanghai artificial intelligence laboratory as a typical representative; the other is one model end-to-end based on multimodal large models, such as wayve's lingo-2 and ideal's recently launched drivevlm.

mao jiming holds a different view on the world model. he believes that the world model is a reasonable end-to-end solution. based on the world model, the intelligent driving algorithm has the ability to understand the scene and make reasonable predictions about the future, and make decisions based on this information, which is a solution that is more in line with human thinking logic.

zhu zheng, co-founder and chief scientist of jijia technology, further added that one model training is very resource-intensive and time-consuming, and has very high requirements for the scale and quality of data. end-to-end uses the model's prediction capabilities to make decisions on scene perception and driving behavior, which is consistent with human driving behavior and habits. according to him, jijia currently has an end-to-end basic prototype system based on the world model, and is working with a car manufacturer for joint verification on vehicles, and will soon disclose some progress.

in august last year, pony.ai connected the three traditional modules of perception, prediction, and regulation and control into a one-model end-to-end autonomous driving model, which has now been installed on l4 autonomous taxis and l2 assisted-driving passenger cars. in lou tiancheng's view, both modular end-to-end and one-model are in the early stages and have not yet been verified by mass production and delivery. it is expected that in the next 1 to 2 years, the end-to-end technical route will move from divergence to consensus.

"in the long run, the end-to-end finale will eventually move towards a one model." mao jiming said that in the current state, the "two-stage" end-to-end adopted by huawei, xiaopeng and other companies are still semi-end-to-end implementations, or in an end-to-end transitional state rather than a complete state.

not long ago,extremeauto ceo xia yiping also publicly stated, "there is no company in the market now that is truly end-to-end, it's all marketing gimmicks." it is understood that jiyue's current end-to-end intelligent driving solution also adopts a "two-stage" technical architecture.

the "black box" attribute is a misunderstanding, it can be made similar to a gray box or a white box

the advantages of the end-to-end solution come from its architecture that integrates multiple modules, but this design also makes the system closer to a "black box" than the original understandable "white box", making it more "unexplainable".

tiancheng lou believes that unexplainability is a natural defect of the end-to-end system, but whether it will limit the development of end-to-end autonomous driving technology depends on the situation. for l2, unexplainability does not affect end-to-end applications. for example, modular end-to-end still retains various main functional modules, and the intermediate output features can be further extracted as interpretable data.

for l4, the requirements for safety and certainty are much higher than those for l2. therefore, it is necessary to integrate regular instructions into the model, such as traffic regulations, driving preferences, etc., to help the end-to-end autonomous driving model better understand driving intentions. at the same time, it is also necessary to upgrade the model capabilities to output driving intentions to the outside world and further improve interpretability.

in zhu zheng's view, although end-to-end is indeed a black box from the product level and the final r&d form, from the perspective of engineers, product designers and users, end-to-end can be made similar to a gray box or white box.

first, modular joint end-to-end distinguishes the three modules of perception, prediction, and planning in detail. any planning result can be associated with a previous intermediate module. second, one model can output modular intermediate results. annotating the results for intermediate supervision can make one model converge better, and the modeled intermediate results can also be taken out to engineers or users. third, the most important thing about the world model is its predictive ability, and its prediction results can also be associated with the modeled intermediate results.

mao jiming said that the current end-to-end "black box" statement is a misunderstanding of the training and reasoning details of the entire model. as long as the r&d cognition is presented in a form that can be explained to the outside world, it is no longer a black box.

wang panqu also believes that the question of unexplainability reflects the public's trust in technology, that is, whether the performance of the technology itself has reached a standard that everyone can accept. with the development of data-driven, algorithm design, large model security and other related technologies, end-to-end performance and reliability will definitely make a huge leap in the next one to two years. through large-scale testing and full verification of performance, its explainability will no longer be a key issue.

end-to-end vehicle on-boarding "peak" is coming, and commercial vehicles will be put into use faster

"modular end-to-end large-scale market launch has only been within the past year, while end-to-end based on large language models will take an additional 1 to 2 years." wang panqu pointed out that l4 autonomous driving for commercial vehicles will definitely be launched faster than that for passenger vehicles. the reason is that high-level autonomous driving systems that can be mass-produced are very picky about the difficulty of the landing scenarios, while commercial vehicle scenarios are simpler than passenger vehicle scenarios, and a single scenario is easy to close the commercial loop, which is also convenient for scenario asymmetry.

liu yudong is more optimistic, believing that modular end-to-end and one model end-to-end will begin to be pushed more intensively next year. in addition, liu yudong said that based on the radical degree of technological development, talent aggregation, technological iteration speed and technological application difficulty, the time for end-to-end to be truly implemented in commercial vehicles and passenger vehicles may be similar, but the implementation scope of passenger vehicles will be larger than that of commercial vehicles, and commercial vehicles will slowly start to take off in the later stage.

"several hurdles must be overcome before end-to-end mass production. the first is the preparation of vehicle-side computing power, the second is the iteration of end-to-end algorithms, the third is the scale of cloud data, the fourth is the scale of computing power, and the fifth is the verification plan." mao jiming said.

in his opinion, tesla and domestic leading oems and companies such as wei, xiaoli, and huawei have already met the three requirements of vehicle-side computing power, cloud data scale, and cloud computing power scale. by the end of this year and the first half of next year, the end-to-end algorithms of several leading car companies will be able to be put into large-scale vehicles; starting from the second half of next year, the industry will usher in a blowout state of end-to-end mass production.

does entering the end-to-end market mean “starting over again”?

the development and adoption of end-to-end systems will undoubtedly bring about a technological revolution in the overall intelligent driving solution. so, does entering the end-to-end market require starting over with previous technologies?

liu yudong believes that the original autonomous driving technology will not be completely overturned, and end-to-end will share certain algorithms and software accumulations with it.

the first is the perception part. now many end-to-end front-end camera information processing parts will use bev methods, such as backbone or encoder. the second is the regulation and control part. some of the know-how in regulation and control can be migrated to the end-to-end system. the third is data infrastructure, which is an important capability required for enterprises to do end-to-end in the future. companies that can do a good job in bev solutions also have strong data infrastructure.

in mao jiming's opinion, whether it will be overturned depends on what the previous technical solution was. he said that if the previous technical solution of a certain intelligent driving company has many rules, then these rules will basically be overturned, with the core of the end-to-end purely data-driven multimodal large model; if the previous technical solution has been mostly changed to model-driven, then this part of the code is likely to be reused in some form.

it needs to be emphasized that the changes in r&d models brought about by end-to-end algorithms are the focus that every oem and autonomous driving company needs to pay attention to, and are also the most painful part.

wang panqu also mentioned that in addition to the model side, the end-to-end system also needs to do more work on data: first, the data closed-loop system and its iteration efficiency need to be reconstructed, and second, the end-to-end testing and verification. the sensor input of the entire simulation platform must be very realistic, which is a very challenging technical problem at present. however, in terms of labor costs, the overall cost of the end-to-end intelligent driving system is lower than that of the non-end-to-end system, because the end-to-end system has only a few modules, and a core team of 20-30 engineers should be enough.

in addition, mao jiming pointed out that the cost structure of intelligent driving solutions will also change from the traditional modular architecture to the end-to-end model: the labor costs of a large number of r&d experts who write various rules will be transferred to the data aspect. this is a good thing for oems with mass production capabilities, because the cost of obtaining data is low, and the overall cost of intelligent driving solutions will actually drop significantly.

in terms of computing power investment, lou tiancheng said that in the short term, purchasing high-computing chips will indeed increase current costs. but in the long term, once end-to-end technology matures and is applied, the initial investment costs will gradually be diluted.

the pure end-to-end computing power investment is less than the modular architecture, at least 100 to 200 million per year

"if you want to achieve a good training level for the end-to-end model, you need to invest at least 100 to 200 million yuan in computing power a year. the figures for the passenger car track will definitely be more impressive."

according to wang panqu, the computing power required for end-to-end is divided into two aspects: training and deployment. deployment is equivalent to the number of domain controllers to be purchased. this part of the cost is fixed and relatively low, and is related to the cost of a single vehicle. the biggest cost is the training cost, which is divided into self-built card purchase and cooperation with cloud service providers. for car companies with a relatively large order volume, building their own data center is a cost-effective choice; but for car companies with a smaller order volume or in the early r&d stage, renting servers from cloud service providers is a better choice.

previously, lang xianpeng, vice president of intelligent driving at ideal auto, publicly revealed that ideal currently spends 1 billion rmb per year on computing training, and expects to spend 1 billion usd per year in the future. "if we can't spend 1 billion usd per year on training, we may be eliminated in the future competition of autonomous driving."

in terms of computing power scale, lou tiancheng believes that if it is just a simple end-to-end autonomous driving model training, hundreds of high-computing gpus can support it. if long-term investment is required and end-to-end quality is guaranteed, the training computing power scale of each autonomous driving company is basically at the level of thousands of cards, and car companies will invest more.

mao jiming gave a more specific end-to-end computing power requirement: the entire system requires at least two nvidia orins or a single nvidia thor. he said that the computing power requirement of a pure end-to-end system is less than the total computing power requirement of a modular architecture, but in addition to the main system, a mass-produced end-to-end system often has a bypass system, and its computing power requirement is generally comparable to that of the previous modular architecture.

however, wang panqu believes that as the computing power of vehicle-side chips increases, computing power will not become an obstacle to end-to-end vehicles in the future. lou tiancheng holds the same view, saying that from the classic architecture to the end-to-end, the total amount of code will be significantly reduced, and the computing resource consumption brought by the end-to-end neural network will not necessarily be significantly improved compared to the bev model.

"the desire for higher computing power comes more from the increase in model parameters and model performance, rather than from end-to-end transformation." in addition, he pointed out that from the perspective of end-to-end application, relevant companies should think more about how to make full use of existing chip computing power resources to improve utilization efficiency.