news

controversy end-to-end: is it the end of l4 autonomous driving or a marketing feast?

2024-09-24

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

marked by tesla’s release of the v12 version of the fsd intelligent driving system, intelligent driving has entered the end-to-end era overnight.
"the lower-limit capability of the end-to-end model is expected to increase rapidly next year. once improved, it will be possible to exceed the l4 standard on a global scale in less than two years." at the 2024 hangzhou yunqi conference, he xiaopeng, chairman of xpeng motors, said that after adopting the end-to-end large model, tesla's fsd will be completely different from before, and it may be better than human drivers next year.
xiaopeng motors is one of the first domestic automakers to follow tesla. at the end of july this year, xiaopeng motors began to push the xngp intelligent driving system based on the end-to-end big model to users. by september this year, huawei, ideal and other automakers have also begun to push corresponding intelligent driving systems based on the end-to-end big model to users; nio applied the end-to-end big model to the aeb system and released its self-developed world model.
with the introduction of end-to-end large models, car companies are becoming more aggressive in promoting smart driving. the once-popular smart driving and high-precision maps are no longer popular, and the launch of door-to-door and point-to-point driving assistance systems has been officially put on the agenda. xiaopeng motors even claims that it can achieve an l3+ level autonomous driving user experience with the hardware cost of l2 level smart driving.
for a time, intelligent driving systems that do not have end-to-end capabilities seemed to be doomed to fall behind. "all intelligent driving systems that do not use big models will be eliminated." he xiaopeng also said that all l4 autonomous driving companies should switch to big models as soon as possible.
chentao capital, together with three other parties, released the "end-to-end autonomous driving industry research report" (hereinafter referred to as the "report"). the report shows that among the more than 30 front-line experts in the autonomous driving industry interviewed, 90% said that the companies they work for have invested in the research and development of end-to-end technology, and most technology companies believe that they cannot afford to miss this technological revolution.
however, not all "players" agree that the end-to-end big model is a disruptor of the current intelligent driving system landscape.
qingzhou zhihang cto hou cong told the reporter of china business network that he had experienced tesla's fsd v12.3 system in the united states. although it was a great improvement over tesla's previous fsd, it still had a clear gap compared with the waymo robotaxi, which was mainly based on regulatory control. hou xiaodi, the former founder of tusimple, called on the industry to be rational and not to deify end-to-end.
in this technological controversy, car company leaders such as musk and he xiaopeng support end-to-end; while executives of l4 smart driving companies such as hou cong, hou xiaodi, and lou tiancheng (cto of pony.ai) believe that the end-to-end large model cannot directly upgrade the l2 smart driving assistance technology to l4 autonomous driving.
the report also shows that because the technology is still in its early stages of development, there are still many application difficulties and pain points that need to be solved when putting end-to-end large models on vehicles, such as large differences in technical routes, large demands for data and computing power, immature testing and verification methods, and huge resource investment.
on the road to the final stage of autonomous driving, the end-to-end big model has become another controversial technical route after pure visual perception, radar fusion perception, etc.
is tesla leading the technological revolution again?
starting from integrated die casting and battery-body integration, tesla has become the industry benchmark for new energy vehicle technology. many chinese automakers are considered to be "crossing the river by feeling the tesla path", with end-to-end large-scale models installed on cars, and tesla has once again led the transformation of new energy vehicles.
before the end-to-end large model was installed in the car, the intelligent driving assistance system was mostly divided into multiple modules such as perception, planning, decision-making, and control. artificial intelligence and machine learning were mostly used in perception, planning and other aspects, but the modules were mainly defined by handwritten rules, which was called "rule-based".
however, in the actual operation of the system, vehicles often encounter endless coner cases (long-tail problems). to solve such problems, engineers need to write codes and set rules according to specific scenarios. in this mode, intelligent driving assistance or autonomous driving systems often require manual input of a large number of rules.
wu xinzhou, global vice president of nvidia and head of the automotive business unit, believes that most of the existing algorithms for autonomous driving are rule-based. it is simple to talk about, from what you see to what you do, but it is difficult to set up good rules. it requires a lot of human engineers to think of all possibilities as much as possible, and this method has an upper limit.
unlike traditional rule-based intelligent driving assistance systems, end-to-end autonomous driving solutions mean that the entire process from perception to regulation and control is processed through advanced algorithms and deep learning technologies.
the application of end-to-end technology in autonomous driving transforms the original architecture that combines multiple models such as perception, prediction, and planning into a single model architecture of "perception and decision-making integration."
according to a research report released by china cinda securities, "end-to-end" means that one end inputs environmental data information such as images, goes through a multi-layer neural network model similar to a "black box" in the middle, and the other end directly outputs driving instructions such as steering, braking, and acceleration.
compared with the traditional rule-driven modular architecture, the end-to-end implementation will bring a series of advantages: global task optimization based entirely on data-driven, with better and faster error correction capabilities; it can further reduce the lossy transmission, delay and redundancy of information between modules, avoid error accumulation, and improve computing efficiency; it has stronger generalization capabilities, shifting from rule-based to learning-based, with zero-sample learning capabilities, and stronger decision-making capabilities in the face of unknown scenarios.
with the support of end-to-end big models, intelligent driving systems can achieve faster iteration and progress. take xiaopeng's xngp as an example. after applying the end-to-end big model, its three-in-one neural network xnet + regulatory control big model xplanner + ai big language model xbrain can be iterated every 2 days, and the intelligent driving capability has increased 30 times in 18 months; the data system capabilities and neural network architecture can achieve rapid diagnosis and solve long-tail problems in hours.
with the introduction of tesla's end-to-end large model, the intelligent driving technology route of chinese automakers will also begin to undergo a major transformation in 2024.
in the past few years, the technical route disputes of intelligent driving assistance systems of chinese automakers have mostly focused on visual perception and fusion perception, and the competition at the terminal is more about the speed and number of city openings. at the beginning of 2024, huawei, xiaopeng and other companies are still competing for no high-precision mapping and true "nationwide opening".
after the end-to-end big model was installed in the car, the generalization ability of the intelligent driving assistance system was greatly improved, and the importance of verification and opening in a single region decreased. at the same time, the end-to-end weakened the previous distinction between perception, planning, decision-making, control and other modules. many car companies have also begun to readjust the organizational structure of the autonomous driving team based on the needs of the end-to-end big model.
at the end of 2023, ideal adjusted the organizational structure of its intelligent driving team. in this adjustment, ideal reorganized the big model into a team and placed it under the front-end algorithm r&d team, which was responsible for the overall r&d and on-board installation of the end-to-end architecture. in 2024, nio established the big model department, deployment architecture and solutions department, and space-time information department, and abolished the original perception department, planning and control department, environmental information department, and solution delivery department.
although end-to-end driving is in full swing, most chinese car companies have not yet achieved the theoretical "one-mode" end-to-end intelligent driving.
the cto of an autonomous driving company told reporters that the end-to-end intelligent driving application can be divided into two stages: the first stage is a two-model solution, consisting of an end-to-end perception and an end-to-end regulation and control, which is currently a more mainstream direction used in the industry; the second stage is a one-model solution, a large model to solve the problem of information input to decision output, which is closer to the direction of agi, but this direction is more difficult and it is estimated that it will take 3-5 years before some large-scale applications can be achieved.
at present, the industry generally believes that the research and development progress gap between domestic car companies and tesla is about 1.5 to 2 years. gu junli, deputy general manager of chery automobile co., ltd., believes that in order to catch up with tesla in terms of business model, it is necessary to form a large-scale product. "when the data reaches more than one million tesla-level, through intensive training of the model, the intelligent driving can learn the video stream and directly tell the driver the driving direction, just like the popular chatgpt." gu junli said.
are there differences in route between vehicle manufacturers and suppliers?
while many car companies have launched end-to-end large models and are touting the arrival of the era of autonomous driving, many suppliers focusing on autonomous driving have expressed different opinions.
"after tesla launched the end-to-end fsd, some problems arose. the car was always prone to going onto the shoulder of the road, especially at night. sometimes there would be scratches, and sometimes it would directly rush onto the shoulder of the road and flatten the tires." hou cong told reporters that also in the united states, waymo did not adopt an end-to-end large model, but it has been able to realize unmanned robotaxi operations in multiple cities, and user feedback has been quite good.
the end-to-end big model itself is not a new technology that has only achieved breakthroughs in recent years.
"before the emergence of deep learning around 2010, they were all called model analysis algorithms. at that time, we were doing pedestrian detection at tsinghua university. we had to extract some feature information from the image, such as the curvature of a person's shoulder, the color of their eyes, etc. these features were summarized by us manually, that is, rule based; and after deep learning came out, we input images and let deep learning learn autonomously. in the end, each person's different features were learned by deep learning, not defined by humans. this is the same as today's end-to-end, which is based on learning based." hou cong told reporters that this system, like the current end-to-end intelligent driving assistance, requires massive data support.
this is also considered to be one of the important factors for automakers to compete to choose end-to-end large models.
compared with l4 autonomous driving suppliers who only operate a fleet of more than 100 test vehicles, automakers usually have hundreds of thousands or even millions of products on the road. users can generate massive amounts of data during driving, which helps automakers train their end-to-end intelligent driving systems and achieve rapid iteration of the systems.
in addition, dong jun, an engineer from an l2+ intelligent driving assistance system supplier, told reporters that for suppliers, end-to-end intelligent driving is difficult to become a standardized product; changes in the body shape, changes in sensor installation positions, etc. require the entire system to retrain the model, which requires a lot of cost and time and is inefficient.
the significance of the end-to-end big model for l2 driving assistance is that it can speed up the opening of cities and accelerate the realization of the "nationwide driving" in the mouth of car companies. however, for l4 autonomous driving companies, the end-to-end big model can also reduce the system's dependence on high-precision maps in the initial stage of operation, allowing companies to expand their operating scope more quickly; but in the middle and late stages of operation, high-precision maps still have an important impact and can further improve the reliability, safety and smoothness of the autonomous driving system.
on the other hand, compared with tesla and ideal, which have already achieved profitability, most autonomous driving companies currently rely mainly on financing. the end-to-end large model requires not only massive amounts of data, but also a large amount of capital investment.
"in the future, when intelligent driving enters the l4 stage, data and computing power will grow exponentially every year, which means that at least 1 billion us dollars will be needed each year, and continuous iteration will be needed after 5 years. at this scale, it will be very difficult for a company to make a profit and profit that cannot support the investment. therefore, there is no need to focus on how many billions are invested in autonomous driving, but to start from the essence and see whether there is sufficient computing power and data support, and then see how much money is needed to invest." lang xianpeng, vice president of intelligent driving research and development of ideal auto, told reporters.
xia yiping, ceo of jiyue automobile, believes that 20 billion yuan was once recognized as the financial threshold for car manufacturing, but now companies cannot do smart driving well without 50 billion yuan.
more importantly, for autonomous driving companies like waymo and pony.ai that aim to achieve l4 robotaxi, their considerations on system weight, cost, etc. are very different from those of vehicle manufacturers.
unlike l2 driver assistance, for l3 and above autonomous driving, the main responsibility for accidents will be transferred to the vehicle, which places extremely high demands on the stability and safety of the autonomous driving system. the unexplainability of the end-to-end large model black box brings certain risks to the autonomous driving system.
"automakers have launched end-to-end large-scale intelligent driving models one after another and have promoted them extensively. the core purpose is to create differentiation and sell cars," said dong jun.
hou xiaodi said in an interview with the media that if tesla's fsd causes an accident, the driver is still responsible. tesla requires drivers to keep their hands on the steering wheel throughout the entire process, and the accident has nothing to do with tesla. in addition, tesla's business is to sell cars, and fsd is the added value of selling cars. if you want to consider how to sell more cars, you can't focus on a limited area like l4 and solve all corner cases in this area.
hou cong and other interviewees from autonomous driving companies pointed out that l4 autonomous driving requires 100% safety and cannot accept the unexplainable and uncertainties brought by the end-to-end "black box". in addition, there are huge differences in business logic between l2 and l4.
for vehicle manufacturers, selling cars is the main business, and costs determine profits and market competitiveness, so they will inevitably not be able to arrange too much safety redundancy in their products; l4 robotaxi is more operational, and will be mainly to b business for quite a long time, and will not directly serve consumers, so related companies need to consider not only the cars, but also various situations in vehicle operations.
"for example, what if the car gets stuck, what if the hardware breaks down, or what if an accident occurs? this requires more redundancy, and tesla cannot reserve a lot of redundancy like waymo because the two have different business logics," said hou cong.
world model enables autonomous driving?
despite the differences, many technicians from autonomous driving companies also agreed in interviews that the use of end-to-end large models in vehicles can improve the current capabilities of intelligent driving assistance systems. many practitioners said that the end-to-end large model presents a "seesaw" state. the use of end-to-end large models in vehicles can improve the capabilities of intelligent driving assistance systems, but it will also reduce the lower limit of system performance.
"the end-to-end large model is based on a probabilistic model training. one problem is that for relatively simple and easy-to-describe scenarios, its output is often not that accurate and the bottom line is relatively low. tesla has done a pretty good job in this regard, but has not completely solved this problem. we believe that in the current lack of sufficient data, we still need to gradually achieve end-to-end, one module at a time, replacing each other, completing the end-to-end while providing a safety net. with this relatively solid engineering infrastructure and rapid iteration approach, we can step by step improve the upper limit of system performance, while also ensuring the lower limit of system performance." said chen liming, president of horizon robotics.
the end-to-end large model is data-driven, with sensor data as input and driving decisions as output. however, the process is highly unexplainable, and people cannot know the process by which the system makes the final decision. it is often likened to a black box.
hou cong believes that the current end-to-end large-model intelligent driving and the previous rule-based intelligent driving are somewhat similar to the automobile production process. "in the past, car companies bought parts from different companies and put them together. on the one hand, it was convenient for procurement and dispersed suppliers, which made it less likely to be "choked". the second point is that it is easy to repair. whatever is broken can be repaired. the same is true for multi-module autonomous driving. the advantage is that it can better define and solve problems."
taking traditional multi-module autonomous driving as an example, if the system has problems during testing, r&d personnel can find bugs in the corresponding modules and fix them according to the situation. however, for black boxes such as end-to-end large models, r&d personnel can only train strategies, retrain, or modify models, but modify the parameters in the "black box". and as the system is upgraded and iterated, the more difficult the problems solved by the system, the more cost investment is required, which sets a higher threshold for end-to-end large models.
on the other hand, end-to-end large models are data-driven, but massive amounts of data do not necessarily bring positive improvements to the system.
xiao bo, head of pony.ai's ai team, believes that even if the algorithm is good and the system training is also very good, the ability learned from massive human driving data is almost the level of average human driving, which is enough to cope with l2 level intelligent driving assistance; but for l4 or above autonomous driving, the ability needs to be 10 times or more that of a human driver, and this model is not sufficient to support it.
just as end-to-end is rapidly becoming popular, domestic automakers and suppliers have once again proposed a new concept of "world model". lou tiancheng believes that the world model is the best and most important thing at present, and understands it as the only solution to autonomous driving.
the world model can be understood as a simulation and modeling of the real world, which can accurately restore the changes in scenes such as intersections. for example, the trajectory of pedestrians blocked by ghosts; the reaction of pedestrians and other vehicles at the moment of vehicle collision; and even reflect the details that the deceleration of people when running can reach the acceleration of gravity. at the same time, the world model is also a scoring system to evaluate the performance of the autonomous driving system, and it can know which system a is better than system b.
previously, car companies such as nio and li auto have successively released their "world models".
ren shaoqing, vice president of nio autonomous driving, said: "compared with conventional end-to-end models, the new world model has three main advantages that we believe. the first is in terms of spatial understanding. through generative models, we can extract information in a more generalized way by reconstructing sensors. through autoregressive models, we can automatically model long-term environments. third, the world requires more data, and through self-supervision, there is no need for manual labeling. it is a multivariate autoregressive generative model structure that allows us to learn better."
lou tiancheng believes that the world model can be understood as a "coach" simulated by humans. for the l2 system, its driving ability is equivalent to that of an experienced driver; for the l4 system, its driving level is far higher than that of human drivers. if it is used to train the intelligent driving system, the result will definitely be better than that of human drivers.
although there are still controversies, most respondents still believe that at the l2 intelligent driving assistance stage, end-to-end large models can indeed improve the performance ceiling of related systems. what most l4 autonomous driving company practitioners disagree with is that tesla, xiaopeng and other automakers have been touting that their products are based on l2 intelligent driving and even achieve l4 autonomous driving capabilities at the l2 hardware level with the support of end-to-end technology.
"at this stage, car companies are vigorously promoting end-to-end, shaping end-to-end into a cutting-edge technology leading to autonomous driving. the reason behind this is more to sell more cars," said dong jun.
(this article comes from china business network)
report/feedback