Automotive AI Algorithm and Foundation Model Application Review

Large AI model research: NOA and foundation model facilitate a disruption in the ADAS industry.

Recently some events upset OEMs and small- and medium-sized ADAS companies, as the autonomous driving industry changes faster than most people expected.

In large automotive forums in 2022, ADAS companies focused on introducing their driving-parking integrated solutions, and many of them also carried a large stock for the boom of the market in 2023. In 2023, under the pressure of cost reduction, OEMs have yet to apply driving-parking integration on large scale. On the contrary, driven by Huawei, Haomo.ai, Baidu and emerging carmakers, the theme of competition and promotion has directly shifted to highway NOA and urban NOA in 2023.

Reveled by a self-media and confirmed by channels, the highway NOA project an OEM in Southwest China commissioned multiple medium-sized Tier 1 suppliers to jointly develop has fallen flat, and the OEM thus turns to other Tier 1 suppliers DJI and Huawei to carry out this project. Huawei that originally worked on NOA for high-end models starts launching low- and mid-end solutions to compete with small- and medium-sized ADAS Tier 1 suppliers this year.

Meanwhile, emerging carmakers begin to race in number of cities where they introduce NOA. In late August 2023, Tesla demonstrated the autonomous driving performance of FSD V12, the world’s first end-to-end AI-driven autonomous driving system, in a live stream. Elon Musk said that FSD V12 is completely enabled by AI and has no lines of code in it to run for recognition of roads and pedestrians, and the neural networks are used throughout. The C++ code of FSD V12 has been reduced by 10 times from over 20,000 lines to 2,000 lines. 99% of Tesla’s decisions are made by neural networks to give visual inputs and control outputs. FSD V12 just works like a human brain. In addition, the powerful capabilities of FSD V12 are trained using a mass of “video data” and enabled by 10,000 H100 GPUs.

The autonomous driving performance of FSD V12 is amazing. With the support of large AI models, autonomous driving is at a critical point. It is said that FSD V12 will enter China in 2024, and will be largely introduced into vehicles in 2025 after being trained on Chinese roads. On September 12, 2023, Yu Chengdong said that AITO’s urban NOA can be available in major cities across China in late 2023 (according to Huawei, the meaning of being available in cities is not that it is available to all roads in all cities, but that vehicles need to run on structured roads with clear road boundaries). Tesla and Huawei, two benchmarks of autonomous driving, weigh heavily on other automakers and ADAS Tier 1 suppliers.

Where will the ADAS industry go? How do OEMs and ADAS companies deal with the challenges posed by large AI models and NOA? The Automotive AI Algorithm and Foundation Model Application Research Report, 2023 combs through the development history of ADAS algorithm and large AI model, and explores the development trends of large AI models in automotive.

What changes will end-to-end autonomous driving bring?

Autonomous driving algorithm systems are divided into two categories: end-to-end autonomous driving and modular autonomous driving. The modular autonomous driving system has three layers: environmental perception layer, decision and planning layer, and control and execution layer. In the modular autonomous driving system, different teams are responsible for different modules for better division of labor and cooperation, thus improving development efficiency. The shortcoming is that the entire system is very complex and large, and requires manual design of hundreds or thousands of modules.

End-to-end autonomous driving means that the vehicle directly sends the information (raw image data, raw point cloud data, etc.) collected by sensors to a unified deep learning neural network, and the neural network processes it and then directly outputs the driving commands (steering wheel angle, steering wheel speed, accelerator pedal opening, brake pedal opening, etc.) of the autonomous vehicle. In end-to-end autonomous driving, there are no complicated rules in manual design, and with only a very small amount of human training data, the deep learning neural network can learn to drive, regardless of HD map coverage.

When autonomous driving evolves to the urban NOA stage, modular autonomous driving algorithms will no longer meet the needs, and end-to-end autonomous driving algorithms will begin to become mainstream. Furthermore most of the accumulated modular autonomous driving algorithms cannot be migrated to end-to-end autonomous driving, requiring a fresh start. Therefore it is not too late for BYD to start great efforts on development of autonomous driving just in 2023. BYD said that combining foundation model technologies such as BEV perception is an opportunity for BYD to overtake at the bend in advanced intelligent driving; and BYD is developing some distinctive ADAS functions by combining intelligent driving with the Yisifang platform.

BYD’s concept of lane change to overtake is to skip modular autonomous driving and directly step into end-to-end autonomous driving based on large AI models.

Early autonomous driving perception algorithms were based on conventional computer vision technology. After 2010, as deep learning technology develops, neural network is introduced into autonomous driving perception algorithms, bringing a qualitative improvement in the perception effect of autonomous vehicles. Neural network models applied at the perception layer fall into two categories: small models like CNN and RNN, and Transformer foundation model.

Transformer is a neural network model based on the attention mechanism. This model was proposed in Google’s 2017 paper titled “Attention Is All You Need”. Transformer is superior to RNN for its ability to perform parallel computing and handle long sequence inputs. Compared to CNN, Transformer reserves location information and solves the problem of depending on long-distance features. So Transformer has become one of the most popular models in natural language processing. Tesla is the first to introduce Transformer into autonomous driving algorithms, and other emerging carmakers and new brands of conventional automakers follow suit.

Why are large AI models needed?

Needed by urban NOA: currently OEMs are expanding from highway NOA to urban NOA. The expansion from highway scenario to urban scenario means vehicles face far more long tail problems (corner cases). Highway scenario is relatively closed in specific road sections, with highly standardized traffic environments, and highway driving rules clearly define driving behaviors of vehicles; traffic participants are simple, not involving pedestrians, and the driving status is also more predictable. All of which makes highway the first scenario to implement NOA.

Yet in urban scenarios, complex roads and road conditions (traffic light intersections), multiple traffic participants (pedestrians, low-speed two wheelers), and high scenario heterogeneity (road conditions between cities and even between road sections differ greatly) combine to lead to a surge in corner cases in autonomous driving. The implementation of urban NOA therefore requires higher generalization capabilities of autonomous driving models. Considering commercial application cost, we believe that the application of large AI models to improve generalization capabilities and reduce/control vehicle hardware cost is the key to the evolution of autonomous driving algorithms.

Needed to get rid of HD maps and lower cost: before 2022, Chinese OEMs generally implemented urban NOA using the HD map + single vehicle perception solution. Yet in the implementation process, they found that HD maps pose three big problems: 1) Inability to achieve real-time updates; 2) Regulatory risks; 3) High cost. The upgrading of autonomous driving perception algorithms to the BEV+Transformer architecture helps urban NOA cast off HD maps.

BEV perception model is far more competent to cope with extreme weather conditions: in the post fusion model, the resolution of data/video streams collected by cameras will be much lower in the case of extreme weather conditions such as rain and snow, making it difficult to meet the criteria of acceptability judged by the cameras, so the results transmitted to the backend for planning and control slump. Unlike the post fusion model, the process of converting different views into the BEV in image collection by cameras involves feature level fusion. For example, in extreme weather conditions, some photon information still reflects the situation of the obstacles ahead, which can be used for subsequent planning and control. Under the framework of feature level fusion, the perception model makes far more use of data.

Large AI models not only find successful application in autonomous driving, but also have a promising future in intelligent cockpit. Ge Yuming, the head of the C-V2X Working Group of the MIT2020 (5G) Promotion Group, says that foundation models have three impacts on intelligent cockpits:

Firstly, the context understanding capability of foundation models can enhance voice assistant’s ability to understand and respond to passengers’ voice and semantics, and enables such functions as continuous dialogue, memory dialogue, and active interaction;
Secondly, foundation models can enable vehicle assistants with multimodal understanding and perception capabilities to reduce driver’s interaction pressure;
Thirdly, in terms of maps, the accuracy of vehicle navigation capabilities in route optimization and judgment will improve with the application of foundation models.

How to deal with the challenges posed by large AI models?

Big data and computing power are important prerequisites for the application of large AI models. The Transformer model requires mileage data of 100 million kilometers or more for qualitative change from quantitative change. The raw data collected by sensors also needs to be annotated before being used for training algorithm models, and automatic annotation tools can greatly improve data processing speed. Since 2018, Tesla’s data annotation has gradually developed from 2D manual annotation to 4D spatial automatic annotation; Chinese providers like Xpeng and Haomo.ai have also announced automatic annotation tools, bringing much higher annotation efficiency. In addition to real data, simulation scenes are an important solution to the problem of insufficient data for training foundation models.

Generative AI is expected to significantly enhance the generalization capabilities of simulation scenes, and help OEMs use more simulation scene data, thereby improving the iteration speed of autonomous driving models and shortening the development cycle.

High computing power is another important condition for Transformer model training, and supercomputing centers thus have become an important infrastructure for autonomous driving providers. Tesla’s AI computing center Dojo uses a total of 14,000 NVIDIA GPUs to train AI models, increasing network training speed by 30%. Among Chinese providers, Xpeng and Alibaba jointly funded and created “Fuyao”, an autonomous driving AI computing center which allows for 170 times faster training of autonomous driving algorithm models.

Except for a few OEMs like NIO, Xpeng and Li Auto, who already work on foundation model application and have ample funds, it is difficult for other OEMs to invest simultaneously in big data and high computing power, supercomputing center, and AI chip self-development as Tesla does. Leveraging the power of large AI model providers to perfect application of foundation models is a relatively pragmatic approach.

For small- and medium-sized ADAS Tier 1 suppliers, it is almost impossible to independently launch NOA solutions that can rival Huawei NCA, and the huge investment in intelligent computing centers is an insurmountable mountain. Meanwhile mainstream automakers start turning to independent development of autonomous driving, leaving ever less scope for small- and medium-sized ADAS Tier 1 suppliers. The industry integration is inevitable. They need to quicken the pace of going public to become an integrator, or seek opportunities to be acquired.

In the industry disruption, there are both risks and opportunities. Huawei and other IT giants have high staff cost, and OEMs are also unwilling to be constrained by these IT tycoons, so small- and medium-sized ADAS Tier 1 suppliers are not without survival space. In the next step, resource integration is critical for small- and medium-sized ADAS Tier 1 suppliers. In the trend for cockpit-driving integration and cross-domain integration, close partnerships with large AI model companies (Unisound, SenseTime, AISpeech, etc.), listed cockpit companies and chassis domain companies among others are all options.

Also large AI models bring opportunities to later AI chip companies. Not all the high-compute chips of conventional intelligent driving SoC and cockpit SoC vendors meet the needs of large AI models. Emerging AI chip companies can precisely redesign high-compute chips required by foundation models, and grow rapidly by dint of customization demand from OEMs.

According to Yu Kai, founder and CEO of Horizon Robotics, China runs ahead of foreign countries for more than five years in vehicle intelligence application. NOA and foundation models will help to further widen the gap between Chinese companies and foreign Tier 1 suppliers. After establishing their foothold, China’s local Tier 1 suppliers are expected to have opportunities to partner with foreign OEMs and Tier 1 giants.

As Mobileye’s CEO Amnon Shashua said, “If you cannot win in China, you cannot win globally.”