Automotive Voice Industry Review 2023-2024

The automotive voice interaction market is characterized by the following:

1. In OEM market, 46 brands install automotive voice as a standard configuration in 2023.

From 2019 to the first nine months of 2023, automotive voice saw rising installations and installation rate. In the first three quarters of 2023, nearly 12 million vehicles were pre-installed with automotive voice, with the installation rate of nearly 80%.

In 2023, there are 46 passenger car brands boasting automotive voice installation rate of 100%, including AITO, Avatr, HiPhi, Rising Auto, ZEEKR, Voyah, Li Auto, Lynk & Co, Tank, NIO, and Xpeng. In 2023, over 20 million vehicles are equipped with automotive voice, with the installation rate higher than 80%.

2. Automakers’ self-development of voice facilitates the reshaping of the voice supply chain.

OEMs’ differentiated demand for intelligent automotive voice and their preference for independent development enable Tier 2 vendors in the conventional voice supply chain to cooperate directly with OEMs. Boundaries between upstream, midstream and downstream of the industry chain tend to blur. For example, the direct cooperation of automakers like GWM, ZEEKR and Wuling with AISpeech improves their installation and intelligence levels of intelligent voice.

The change in industry chain relationships makes the automotive voice competitive pattern change accordingly. By installations from January to September 2023, AISpeech that supported more than 150 models of over 30 automakers ranked third.

3. See-and-speak function becomes a standard configuration, and advanced functions such as parallel instruction, cross-sound-zone inheritance, offline voice, and out-of-vehicle voice are available on cars.

In ResearchInChina’s China Automotive Voice Industry Report, 2021-2022, “see-and-speak” was only installed by some emerging carmakers and leading Chinese independent brands, the longest continuous conversation duration was only 90 seconds, and dual-sound-zone recognition was still the mainstream solution.

In 2023, “see-and-speak” has become a standard configuration in emerging carmakers’ flagship models, with up to 120-second continuous dialogue. Xpeng Motor has also introduced the “Full-time Dialogue at Driver’s Seat” function (when turned on, it allows the driver to see and speak when looking at the center console screen, without needing to wake up the content on the screen). Meanwhile, four-sound-zone recognition has become a new mainstream solution, and Li Auto and Xpeng Motor also introduced six-sound-zone recognition solutions.

In addition, more advanced voice functions became available on cars in 2023.
Parallel instruction: support up to 10 actions in one instruction;
Cross-sound-zone inheritance: available on models of Xpeng, ZEEKR, and Li Auto (cross-sound-zone inheritance: when a person finishes an instruction, if other passengers want to continue, they can trigger this function by saying “I want too”).
Offline instruction: more controllable content. Jiyue 01 supports all-zone, full offline voice. In offline state, Jiyue 01 still enables extremely fast interaction with occupants.
Out-of-vehicle voice: this function in Changan Nevo A07 allows for voice control on trunk, windows, music, air conditioning, pull-out/in, and other functions; this function in Jiyue 01 allows for voice control on car/parking, air conditioning, audio, lights, windows, doors, tailgate, and charging cover.

4. Voice interaction is the first stop for foundation models to get on vehicles in intelligent cockpit scenarios.

The boom of ChatGPT allows the related foundation model technology to rapidly extend from AI to all other sectors. In 2023, foundation models gain pace in automotive industry, and quite a few automakers are exploring the opportunities to implement foundation models in intelligent cockpit, intelligent driving and other scenarios.

In intelligent cockpit scenarios, voice interaction is the first stop for foundation models to get on vehicles. In February 2023, Baidu released a Chinese version of ChatGPT – ERNIE Bot, and brands like GWM, Geely, and Voyah followed; in April 2023, Alibaba disclosed that AliOS intelligent vehicle operating system has been connected to Tongyi Qianwen foundation model for testing, and will later be applied by IM Motors; in August 2023, in Huawei HarmonyOS 4.0, intelligent assistant Xiaoyi was connected to Pangu model for the first time, mainly to improve capabilities of intelligent interaction, scenario arrangement, language understanding, productivity and personalized service.

Besides conventional Internet companies, voice providers as important foundation model players such as iFLYTEK, AISpeech and Unisound have also launched related products.

iFLYTEK Spark cognitive foundation model has six core capabilities: penetrative understanding of multi-round dialogues, knowledge application, empathic chat & dialogue, self-guided reply in multi-round dialogues, file-based rapid learning of new knowledge, and evolution based on correction opinions of massive users;

AISpeech DFM-2 is an industry language foundation model with generalized intelligence. In the field of in-vehicle interaction, AISpeech integrates Lyra automotive voice assistant with DFM-2, which significantly improves capabilities in planning, creation, knowledge, intervention, plug-in, multi-level semantic dialogue, and documentation, and supports multi-modal, multi-intent, multi-sound-zone, and all-scenario multi-round continuous dialogues.