Taxi drivers, don't panic. Programmers in the intelligent driving industry will lose their jobs to AI sooner

Taxi drivers, don’t panic. Programmers in the intelligent driving industry will lose their jobs to AI sooner

2024-07-17

Has the "GPT moment" of intelligent driving come?

Author | Cao Siqi
edit| Jingyu

Every new technology will go through different stages from its birth to its promotion, and will also face different opinions. In order to find the best solution for the technology, developers may give up years of hard work; while commercial organizations are more inclined to judge the timing of technology implementation in order to maximize benefits at the right time.

Regarding intelligent driving, domestic OEMs once had a deep cognitive divergence. Supporters believed that it could bring a "far-ahead" experience, while opponents expressed their disdain by saying "they are just idiots about technology" and "autonomous driving is all bullshit".

In 2024, with the official release of the "end-to-end" Tesla intelligent driving software FSD V12, Chinese automakers' attitude towards intelligent driving finally began to converge.

Taking the new forces in the automotive industry, such as NIO, Xiaopeng and Li Auto, as examples, each company has clearly begun to pursue "end-to-end" technology.

Xiaopeng proposed to introduce end-to-end large models into the intelligent driving system, and said that it would invest 4.2 billion yuan in intelligence and training data this year, with the goal of achieving "internal OTA every two days" in the future. This is an efficiency improvement that was unimaginable in the past when humans maintained hundreds of thousands of lines of intelligent driving code.

NIO also recently reorganized its intelligent driving R&D department, merging the traditional perception and scale teams into a large model team, with the core aim of promoting paradigm iteration based on neural networks.

Even Ideal, which was ridiculed as a "stingy factory" in the past, has recently been promoting intelligent R&D. CEO Li Xiang personally advocated "end-to-end" R&D and brought out the Nobel economist's theory of fast and slow thinking to illustrate that his team has found a way to solve the autonomous driving conner case.

So, why is there such a great magic in the end-to-end that allows manufacturers to move from non-consensus to consensus? How has it changed the paradigm of the intelligent driving industry, and what opportunities and adjustments will it bring?

The GPT moment of intelligent driving has arrived

An important reason why domestic manufacturers quickly reached a consensus was that Tesla was the first to deliver an enviable end-to-end answer sheet.

In March this year, Tesla officially released the FSD V12.3 version of its smart driving software. The biggest change in this version is that the power of the entire smart driving system has been switched from human-written code to a large AI model based on a neural network. Musk used "Video in to Control out" to describe this new working paradigm, that is, AI directly outputs driving operations based on the road information it "sees", which is often referred to as "end-to-end" in the industry.

Last month, He Xiaopeng experienced FSD V12.3.6 in California. In his words, FSD "handles many road conditions very smoothly." This is exactly the biggest advantage of AI neural networks compared to code-driven: it greatly improves the generalization learning ability of the intelligent driving system in different cities and different road conditions.

Translated into advertising and marketing language that is more familiar to domestic consumers, it means: it can be opened nationwide (globally).

Huawei announced the slogan "Can be used nationwide" in September last year | Source: Geek Park

Of course, this conclusion is just a beautiful wish at this stage. In the actual operation process, it still needs the full support and training of AI infrastructure such as data, algorithms, and computing power to approach the goal of "AI becoming as smart as human drivers."

But for peers, FSD V12 is of great significance. It verifies that neural networks can really replace human-written code, and even do better and more efficiently.

This means that there is no need to wait for N years, the ChatGPT moment in the intelligent driving industry has actually arrived. Think about what Alibaba's Zhang Yong once said: All software is worth redoing with AI. FSD V12 has given peers a new direction and confidence: All intelligent driving technology stacks can be redone end-to-end.

When the FSD V12 beta version was released, Musk said that this version compressed the 300,000 lines of code of the previous version to 2,000 lines, which is less than one percent.

The competition of intelligent driving in the new technology stack will not evolve into an anti-innovation involution game of who has more. If the efficiency of AI can really reach the internal OTA once every two days as mentioned by He Xiaopeng, then the human-wave tactics of writing rules and fixing bugs one by one can be declared completely outdated.

So does the intelligent driving industry still need so many programmers? I cannot give an accurate answer, but what is certain is that the work content of intelligent driving programmers will also undergo a series of changes. Programmers who can only write if else rules will most likely be replaced by AI earlier than taxi and online car-hailing drivers.

Trapped in data

In the "End-to-End Autonomous Driving Industry Research Report" released by investment firm Chentao Capital last month, only 13% of the more than 30 respondents in the autonomous driving industry expressed a relatively cautious "wait-and-see" attitude towards end-to-end technology, while the rest expressed a more active "pre-research" or even "full investment" attitude. End-to-end has become a consensus among industry practitioners.

But in fact, no company (including Tesla) can achieve "fundamentalism end-to-end", that is, to concentrate all aspects of autonomous driving in the same large model and truly achieve the same "input visual signals, output pedal and steering wheel operations" as humans.

The core effort of most domestic OEMs at this stage is to connect the perception and decision modules. The key is to cancel the manual definition results between modules and use feature vectors to transmit lossless information.

Schematic diagram of the architecture evolution of end-to-end autonomous driving | Image source: Chentao Capital

Before end-to-end, the traditional autonomous driving architecture originated from the field of robotics and was divided into different modules such as perception, planning, and control. Different modules were developed by different teams, and information was mainly transmitted between modules through manually defined interfaces. To give the simplest example, whether a vehicle is driving on a line can be represented in the simplest computer binary language in the traditional perception module.

The biggest benefit of connecting the perception and decision-making modules is that it can cover more "grayscale scenarios" that cannot be accurately described by rules in the real world. For example, when you are driving, you don't need to know the exact speed of the car in front of you, or whether it crosses the line, you just need to pay attention to the relative position change.

On this basis, based on the theory of generative AI, we expect that neural network models can also generate intelligent emergence after a large amount of input and become AI agents.

The basis of all this comes from data, which is the training material that is "fed" to the model. However, unlike large text-based language models, it is not easy to find enough public video data as training material for intelligent driving models.

The aforementioned "End-to-End Autonomous Driving Industry Research Report" shows that the largest public data set currently has only 1,200 hours of data. According to Musk's statement in 2023, Tesla invested nearly 40,000 hours of video for training in the early stages of end-to-end.

Compared with other car companies, Tesla's data advantage lies mainly in the large number of mass-produced cars.。

Currently, Tesla has delivered more than 6 million vehicles worldwide, while the number of mass-produced vehicles in China's new intelligent driving force is only a fraction of Tesla's. Coupled with the consistent minimalist SKU and fully pre-embedded intelligent driving hardware, data collection becomes easier.

The conventional practice in China before was to rely on manual acquisition of road information. However, to train a smart end-to-end model, it is also necessary to cover as many edge scenarios (conner case) data as possible. Since the appearance of edge scenarios is very random, some manufacturers have said that only about 2% of limited data can be obtained by manual data collection alone.

In addition, compared with Tesla, domestic manufacturers often have more complex SKUs. Due to differences in vehicle size, sensor layout, etc. between different models, the relevant parameters in the model also need to be realigned.

Taking Huawei as an example, Hongmeng Zhixing has demonstrated strong terminal sales capabilities in the past year or so, but for different brands and models of vehicles served by Huawei's car BU, engineers are still needed to align and deliver the end-to-end implementation. The same is true for Weilai, which has two brands and nine models. They reorganized the integration team into the delivery team.

After Sora was released, Musk tweeted that Tesla uses AI to simulate real-world driving | Image source: X screenshot

There is a view that video products such as Sora may become the source of materials for end-to-end models. But even for Musk, using AI-generated content to train AI has not been publicly recognized. After all, data is too important for model training. You know, Musk, who has always been extremely stingy with labor costs, also hired a team of 1,000 people in New York to annotate Tesla's road video data.

Don’t be led astray by Musk

It sounds like it’s natural to turn to end-to-end, but deleting 300,000 lines of code and breaking up and reorganizing the past organizational structure is definitely not an easy decision. In fact, even Musk took this path by chance. The engineer who first proposed to him to learn ChatGPT to build an intelligent driving neural network at the end of 2022 was almost transferred by Musk to solve other problems after Twitter’s acquisition.

After the end-to-end model is trained, the corresponding support system (including computing power, etc.) must be efficient enough. Ren Shaoqing, vice president of intelligent driving research and development at NIO, said in an interview with Tencent Deep Net that if you force end-to-end without basic capabilities, it is like using "poison".

He said: "If your original code structure is clear enough, your (debug) testing may be only 1%. Originally you spent three days to retest 1%, but now you have to spend three days to retest 100%. So your data verification system must be efficient enough."

But don't be led into a ditch by Tesla. End-to-end has only proved that it has the potential to improve work efficiency, but it has not proved that it is the ultimate solution to autonomous driving.

This is consistent with the industry's understanding of whether Scaling Law can lead to AGI (artificial general intelligence) in the physical world: it is certain that generative artificial intelligence can have higher intelligence, but whether it can understand the laws of physics and apply them in fields such as autonomous driving and robotics is still unknown in academia. In the "End-to-End Autonomous Driving Industry Research Report", more than half of the practitioners do not believe that end-to-end is the final solution for autonomous driving technology.

For OEMs that develop their own intelligent driving, the most pragmatic approach at this stage is to rely on end-to-end to implement intelligent driving capabilities as quickly, efficiently and economically as possible. As for the subscription of intelligent driving software, it may take a longer time. After all, in the Chinese market, hardware is often easier to sell than software and services.

Of course, it is highly unlikely that many people want to be an innovative gambler like Musk. If they put aside the development of cheap models and gamble on Robotaxi, the market value will drop by hundreds of billions of dollars if the release is delayed. Most ordinary players just hope that the end-to-end intelligent driving software can help the hardware sell better. Of course, if they can also sell it at a higher price, that would be the best thing.

*Header image source: Visual China

This article is an original article from Geek Park. For reprinting, please contact Geek Jun on WeChat: geekparkGO

Geek Question

The role of programmers in the future intelligent driving industry,

What changes might occur?

July 16. Lei Jun posted: At 7 pm on July 19, this Friday night, I will hold the 5th Lei Jun Annual Speech, the theme of which is "Courage", and will talk about the ins and outs of car manufacturing and the ups and downs of the past three years.

Like and followGeek Park Video Account，

news

Taxi drivers, don’t panic. Programmers in the intelligent driving industry will lose their jobs to AI sooner

Introduction

my contact information