Wei, Xiao and Li end-to-end: Different forms, but the same joys and sorrows
2024-08-19
한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina
On the Chinese Internet, Musk is often teased as "the source god" to mock "TeslaWithout open source, China will not be able to produce pure electric vehicles”.However, in the field of intelligent driving, Tesla has indeed played the role of a lighthouse leading the direction of the industry's prospects for quite a long time by holding AI Day to share technical details (note: there may be more than one lighthouse).However, since last year, Tesla has cancelled its AI Day and only announced technological progress, not technological implementation. All that has been left to the outside world is the frequent success of FSD, which uses an end-to-end technical solution. This year, when various intelligent driving companies opened the end-to-end exercises provided by Tesla, they found a line of small words in the answer column: SolutionThe process is omitted.How to do end-to-end when there is no reference answer? Is there a reasonable business model to support the research and development of end-to-end intelligent driving?The question was first asked to the representatives of new car-making forces, Wei, Xiaopeng and Li Auto.01
Open the black box
From last year to the first half of this year, the main theme of competition among the leading domestic intelligent driving players was to compete in the speed of opening cities using human wave tactics under the traditional modular technology architecture.During this process, the intelligent driving teams of NIO, Xiaoli and Li Auto have expanded to a thousand people (or more), who work day and night to train, test, and verify to overcome corner cases.Tesla FSD has verified the end-to-end effect, giving everyone the opportunity to be freed from repetitive work.But the price is that the various modules of the traditional intelligent driving technology stack can be tested and verified, but the end-to-end intelligent driving system is a black box that only knows the results but not the process.As Wei Xiaoli and others move towards end-to-end, the common problems they face are:For a function like intelligent driving that requires strong safety, a complete black box is unacceptable. We must find a way to open the black box to understand "why the system wants to do this", or at least make its output relatively controllable.XiaopengThe choice is a piecemeal end-to-end progressive route, and its technical solution is the perception neural network XNet + planning neural network XPlanner + visual language model XBrain focusing on scene understanding.Xiaopeng segmented end-to-end
In the end-to-end technology contempt chain, segmented end-to-end is currently at the downstream.Radicals believe that segmented end-to-end still cannot escape the scope of traditional solutions. Although perception and planning have been neuralized, one key point has not changed - the interface connecting the two neural networks is still defined by humans, which means information loss and a large amount of manual labeling. The entire process is not conducive to global optimization or automation.But the advantages of segmented end-to-end are also here: there is an interface defined by humans, which means that intermediate results that humans can understand will be output, making it easier to check and locate problems without causing a ripple effect. For example, if there is a problem with perception, there is no need to retrain the entire network. Training two smaller models together is also easier than training a large end-to-end model and consumes less computing resources.More importantly, this method is theoretically easier to maintain the lower limit of intelligent driving performance.On July 30, after the launch of Xiaopeng XNGP intelligent driving nationwide, He Xiaopeng said, "The building must be built layer by layer. Leapfrog development may be possible, but the risks will be extremely high."This sentence was interpreted as a warning to friendly competitors.In early July, Ideal introduced a one-step end-to-end solution under development at its summer press conference: 4D One Model end-to-end. In the one-step end-to-end solution, perception and planning are packaged into a neural network with hundreds of millions of parameters, and driving videos of experienced drivers become the main training data. This solution supports lossless information transmission and has a higher degree of automation in data flow, which is more radical than the end-to-end solutions of Xiaopeng and Huawei.However, this end-to-end solution has problems such as weak generalization, poor interpretability, and unstable lower limits. For this reason, Ideal connected a large VLM (Vision-Language Model) model with 2.2 billion parameters in parallel to the end-to-end model. This model has a stronger understanding of complex traffic scenes and traffic text signs, and can provide reference for the driving decision-making of the end-to-end model, thus improving the performance of the intelligent driving system.idealEnd-to-end + VLMFast and slow system intelligent driving solutionIn August, Ideal's end-to-end + VLM fast and slow dual-system intelligent driving solution opened internal testing for thousands of professional users, and the official expects to push it to ordinary users by the end of this year or early next year.Prior to this, Ideal was not at the forefront of user perception of intelligent driving, which had an adverse impact on sales (especially for competitors).Ask the worldEnd-to-end + VLM is defined by Ideal as a key battle for overtaking other intelligent driving capabilities and entering the first echelon.In contrast,NIOA conservative yet radical attitude is taken towards end-to-end.Weilai is conservative in that its end-to-end use is very limited and it is not used for urban NoA, but only for active safety functions. On July 11, Weilai began to promote AEB with an end-to-end solution to solve the problem of insufficient coverage of traditional AEB scenarios.The radical side of NIO is that the intelligent driving that will be launched later this year seems to be ready to skip the current popular end-to-end car trend and go directly to the next stage: world model car.The world model is the latest methodology found by the intelligent driving industry. At the 2023 top artificial intelligence conference CVPR, Tesla demonstrated the research and development results of the world model. Wayve.ai, a startup well-known in the industry for its autonomous driving world model GAIA-1, raised $1 billion in May this year.By learning from a large number of real driving scene videos, the world model can predict and generate driving scene videos within a certain period of time in the future and make correct driving decisions. Its essence is time-space deduction. This is similar to human driving behavior. Experienced drivers will predict and deduce the behavior of other traffic participants and changes in traffic flow in their minds, and plan driving operations based on this.The world model goes one step further than the current end-to-end model in that its core task is not only to provide a planned path, but also to "predict pixel changes in driving scenes." This extremely difficult task forces the model to not only learn the behavior of excellent drivers, but also to learn extensive traffic knowledge and common sense of physics.What NIO proposed at NIO IN is an even more difficult "World Model PLUS", which is more complex and has more output dimensions, which means that more supervisory signals can be formed by comparing with the true value, accelerating the training of the neural network and reducing the black box degree of the system operation. But the price is higher development difficulty.NIO World Model,There are a lot ofThe prediction task output
For reference, in order to train GAIA-1, a world model that is only used for demos and only outputs planned paths and videos, Wayve.ai used 4,700 hours of video data and trained it for 15 days with 160 A100s. The world model that NIO wants to train requires more than an order of magnitude more data and computing power resources.After the training is completed, how to compress a complex and huge world model and cram it into Orin-X, which has very limited computing power and bandwidth, while ensuring accuracy and running speed, is another set of complex problems.At the moment, Wei, Xiaoli, and Li Auto, which are trying end-to-end for the first time (even if they are adopting it with caution), have felt to varying degrees the effect of end-to-end "raising the upper limit and lowering the lower limit."For example, Xiaopeng’s latest version of XNGP has achieved unprecedented U-turn capability, but feedback has shown that its intelligent driving performance on highways has regressed.Ideal's end-to-end + VLM solution, which is currently undergoing internal testing, has a high upper limit and a floating lower limit.The end-to-end AEB that NIO has already launched has excellent performance in avoiding ghost peeking out in non-standard scenarios such as blind spots on curves, but it has also been criticized by users for increasing the number of accidental brakes.02
Research and development of L4, prospects of L2?
When it is difficult to grasp the lower limit, automakers have all turned to end-to-end, largely because the upper limit of end-to-end will bring a sufficiently significant improvement in user experience and corresponding business opportunities.However, as car companies become more deeply involved in end-to-end operations, a question lingers in people’s minds: Can the input-output ratio of intelligent driving be calculated?In order to sell more FSD, Tesla reduced its subscription price from $199/month to $99/month in March this year (the buyout price dropped from $12,000 to $8,000). However, in May this year, a foreign data consulting agency analyzed the credit card payment information of 3,500 users and determined that the conversion rate of FSD was only 2%, prompting Musk to refute the rumor on X: "The conversion rate is much higher than 2%, please."However, far more than 2% is still far from enough. Tesla is building a supercomputing cluster with an estimated scale of 100,000 H100/H200 cards at its Texas factory. At the preferential price of $25,000 per H100, the capital expenditure for purchasing computing cards alone will exceed $2.5 billion (the cost of building it into a data center and continuing to operate it is even higher), which is equivalent to $100 billion.2.08 million Tesla FSD subscription fees for a whole year。The intelligent driving business model of domestic car companies is even less optimistic.Xiaopeng has prepared 3.5 billion yuan for AI R&D this year, and Ideal has set the funding threshold for intelligent driving at 1 billion US dollars this month. However, both Xiaopeng's XNGP and Ideal's NOA are standard with high-end models and do not require payment. Weilai did not adopt a free strategy. The high-end intelligent driving function NOP+ is priced at 380 yuan per month. It once made a brief income, but now new cars come with a 1-2 year free use period of NOP+.Including Tesla, high-end intelligent driving at home and abroad is still in the stage of losing money to gain publicity.The contradiction lies in that after entering the urban NOA competition and turning to the end-to-end paradigm, the intelligent driving research and development intensity of these companies has actually been oriented towards L3 and even L4 autonomous driving, but the mainstream market still recognizes its value as "L2 assisted driving that is not worth paying extra for the software."To solve this "gap between expected value and actual value", the most promising way seems to be to enter the largest L4 autonomous driving market, Robotaxi.In 2018, Morgan Stanley gave Waymo's Robotaxi a valuation of $80 billion. Musk's die-hard fan, Cathie Wood of Ark Investment, made a forecast in June this year that Tesla's Robotaxi revenue will "conservatively" reach $603 billion in 2029, helping Tesla's market value reach $7 trillion by then.Prior to this, Musk announced on Twitter that the Robotaxi model will be launched in August (alreadyPostponed until October 10).03
Commercial returns have not yet been miraculousWhether to make Robotaxi has become a hot issue facing Wei, Xiaopeng and Li Auto this year.For Xpeng, which is the closest to Tesla, the answer is YES. In July, He Xiaopeng publicly revealed that Xpeng Motors will launch Robotaxi in 2026.He Xiaopeng believes that the hardware requirements for Robotaxi are far more complex than imagined, but the end-to-end + large-model software algorithm combination with rapid growth capabilities is sufficient to solve L4 autonomous driving. He Xiaopeng set a goal for the team to achieve XNGP's experience comparable to Google Waymo's Robotaxi by the second half of 2025.However, car manufacturing itself is already a capital-intensive business.The establishment of an unmanned taxi-hailing platform will infinitely extend the business chain and investment return cycle.Google invested $5 billion in Waymo this year, so it is impossible for Xiaopeng to be so wealthy.In June and July, He Xiaopeng visited Didi CEO Cheng Wei and Uber CEO Dara Khosrowshahi. He said that Xiaopeng does not want to operate Robotaxi, but hopes to export models and autonomous driving technology to global partners.Uber's Robotaxi, currently in cooperation with Hyundai
The car companies' approach to Robotaxi with end-to-end technology has attracted a lot of backlash from L4 autonomous driving practitioners, including former TuSimple CEO Hou Xiaodi, Pony.ai CTO Lou Tiancheng, and Qingzhou Zhihang President Hou Cong. Some of them criticized the car companies for their end-to-end myths, while others said that the car companies' relevant capability systems are not sound, but their core arguments remain the same:Although the high-end intelligent driving of car companies has made rapid progress, it is essentially designed within the framework of assisted driving, and the core goals are availability and cost, while the most important things for Robotaxi are reliability and safety. The difference in goals means that it is difficult for the two to use the same software and hardware, and it is difficult for car companies to smoothly transition from high-end intelligent driving to Robotaxi[1][2][3].This is exactly what Li Bin wanted. In an interview on July 27,He made it clear that he "does not think Robotaxi is an exciting achievement and business model" and angrily denounced:"The value of intelligent driving is not to eliminate the jobs of private car drivers and taxi drivers who work so hard today."Another reason why he is not optimistic about robotaxi is becauseDue to limited road resources and government regulation, robotaxi cannot be deployed without limit, which makes it difficult for it to have a business model with high marginal returns like software cloud services.Li Bin has always insisted that people would want to own a car of their own, so the goal of NIO Intelligent Driving is to help drivers free up their energy and reduce accidents. The key word of the business route is scale effect - sell more cars to ordinary users, charge subscription fees for high-end intelligent driving to sufficient users, spread costs and gain profits.However, facing the high cost of intelligent driving, NIO's cumulative user base of less than 600,000 is still not enough, and exporting intelligent driving capabilities has become an option. Ren Shaoqing, head of NIO's intelligent driving, recently expressed for the first time NIO's willingness to open its intelligent driving solutions to other car companies, just like NIO opened up battery swapping.In contrast, Li Xiang’s view on intelligent driving is “radical in technical judgment and conservative in business strategy.”Li Xiang set a goal at the Chongqing Automotive Forum this year: the combination of end-to-end + VLM will achieve L4 autonomous driving within three years. But the difference is that Ideal has never thought about the Robotaxi business. Even so far, Ideal has not shown any interest in charging for intelligent driving software, and the logo on its official website is still "Full-scenario intelligent driving, lifetime zero subscription fee."This is related to the competitive situation Ideal is in. In the past year, Ideal has facedHongmeng Intelligent Driving, sales have been under significant pressure. The sharpest spear of Hongmeng Zhixing is Huawei's ADS intelligent driving capability.At a time when Hongmeng Intelligent Driving and Huawei ADS 3.0 (buyout price of about 10,000 yuan) are surging, further putting pressure on Ideal, Ideal's AD MAX Intelligent Driving, which is more user-friendly than before but free, can help grab more orders. Unlike Xiaopeng and Weilai, Ideal's intelligent driving KPI is not to obtain operating income, but to serve sales.However, as the domestic automobile industry enters the knockout stage, it is impossible for NIO, Xiaoli and Li Auto's intelligent driving business to remain in the status of making friends for a long time.The cost of a training card starts at 100,000 yuan, and the human cost of a team of 1,000 people starts at 1 billion yuan per year. As one of the most money-burning businesses of NIO, Xiaoli and Li Auto, Zhijia has embarked on a more resource-consuming path of "making miracles with great effort", but whether it can also reap the rewards of "making miracles with great effort" is still full of uncertainty.[1] Hou Xiaodi, the Cautious Warrior, Jiazi Light Years
[2] Talking about Robotaxi with Lou Tiancheng: "The more advanced L2 is, the further it is from L4", Tencent Auto
[3] Tesla thinks Robotaxi is too simple | Dialogue with Hou Cong, co-founder of Qingzhou Zhihang, Yunjian Insight