High Performance Computing for Automotive

Computers on wheels. That’s how people currently see cars. Practically everything that happens in a vehicle is being monitored and actuated by a microcontroller, from opening windows to calculating the optimal fuel-air mixture for the current torque demand. But the surface has only just been scratched in terms of how much computing power is making its way into vehicles.

Rise in autonomous vehicles L3 and robotaxis

The age of autonomous transportation is dawning and with it will bring a new era of computational requirements for the automotive industry. Automated cars require expansive suites of sensors to scan the environment and provide the car with the data it requires to emulate human driving. IDTechEx’s report found that leading SAE level 4 robotaxis have up to 40 individual sensors. Combined with the coming rollout of automated technologies in the vehicle market, this will drive a 10-year CAGR of 13% in the automotive sensors market. However, sensors alone are next to useless without high performance computing processing their data and constructing a 3D rendering of the environment to inform the vehicle’s programmed driving policy.

High performance computing (HPC) takes the real-time data outputs from the array of sensors and performs several important processes. Two key challenges it faces are sensor fusion and object classification, and there is some disagreement as to the order in which these are done. Some think that early fusion is best, in which all sensor data is combined into a 3D rendering of the scene, and then an AI algorithm, run by the vehicle’s HPC unit, identifies and labels each object detected. Others think that an object list should be generated from each sensor, and then the results fused. This has the advantage of being able to cross-reference the detections from each sensor and check for agreement. The downside of this though is the challenges in handling discrepancies across the object lists from different sensors.

The autonomous computing SOC

Whether late fusion or early fusion is used the HPC will still have lots of data processing to do in the form of image processing and running artificial intelligence algorithms for image classification and driving policy. The key components that handle these tasks are graphics processors (GPUs), computational processors (CPUs), and RAM. Typically, these are separate discrete components; however, the specific requirements of HPC for autonomous cars have lead to them being combined onto single chips called SOCs or systems on chips.

These combine GPUs, CPUs, RAM, and more onto a single piece of silicon. The ideal SOC can take in data from all the vehicle’s autonomous driving sensors, process it, identify and classify all detected objects, and create a set of steering, throttle, and brake actuations according to the driving policy. So, the SOC is responsible for the entire autonomous driving system.

Putting all these features on a single chip is key to meeting the computational requirements of autonomous driving. With next to zero physical separation, each part of the chip can exchange data with near-zero latency, next to no noise, and with huge bandwidth. This is opposed to having discrete components spread over a PCB, with more interfaces and more data traces introducing noise and latency.

Automotive SOC performance (TOPs). Source: IDTechEx

GPUs are a key part of the puzzle; their image-processing abilities, combined with their suitability to run AI algorithms through deep learning and Neural Networks, make them a cornerstone of the SOC. This is why we see Nvidia gaining traction in the autonomous vehicles space. It has a long history of developing GPUs for graphical applications in computing and has been able to pivot, bringing its expertise to the automotive industry. Its Xavier and Orin platforms have been staples in autonomous vehicle computational processing.

Mobileye is another that has pioneered this industry. Founded in 1999, it made a fast impression and name for itself, attracting the interest of Intel and leading to acquisition. It is now public again and has found its way into many consumer vehicles powering ADAS applications.

Mobileye and Nvidia have been increasing their computation power recently, progressing from a few TOPS (terra operations per second) to tens of TOPS, now hundreds of TOPS, and targeting thousands of TOPS. The main way these improvements are coming is through the adoption of increasingly smaller nodes sizes from leading foundries like TSMC and Samsung. They have been chasing these improvements through smaller node technologies from their supporting foundries.

Growing semiconductor technologies

In recent years, IDTechEx has seen Mobileye, Nivida, and others move from 28nm in 2018 to 7nm FinFET solutions and lower in 2021. However, the foundries are now producing sub-5nm technologies and heading toward sub-1nm technologies in the future. IDTechEx has seen that each time the node technology halves, the computation power increases by a factor of 10. But chasing smaller and smaller node sizes will get more and more expensive. A single 300mm wafer of TSMCs 3nm technology costs in the region of US$20,000, and that price will continue to grow as sub-3mn technologies are demanded in a variety of industries, from normal computing applications such as phones, laptops and PCs, to the new demand coming from the automotive sector.

As such, automotive HPC developers need to think about how they can optimize existing technologies to get the most performance. One approach that IDTechEx is seeing is an increased focus on artificial intelligence (AI), neural networks (NN), and deep learning (DL) accelerators. These use new data AI-enhanced data processing strategies, reducing the reliance on classical approaches found in the GPU. This can boost the performance of the chip very cheaply, requiring less investment in smaller node technologies and even producing an overall efficiency benefit.

IDTechEx is seeing AI becoming more common in SOC block diagrams from leading tier 2s like Mobileye and Renesas. But one particularly interesting prospect is Recogni. Recogni is a start-up that has developed an AI accelerator for autonomous driving SOC applications that promises game-changing computational power and efficiencies.

Even with reducing node sizes and imaginative AI solutions, the larger computer chip industry is still seeing a slowing in the pace of development. Moore’s law says that computational power should double every two years, a decades-old empirical formula that has held strong until recently. Some are saying that Moore’s law is beginning to slow as the industry faces increasingly difficult technological challenges to achieve smaller and smaller incremental gains. Others say Moore’s law is dead.

A prominent solution to deal with the slowdown of Moore’s Law and the substantial increase in the manufacturing cost of monolithic integrated circuits (ICs) is the concept of “chiplets.” The core concept of chiplets involves deconstructing a monolithic IC into distinct functional blocks, transforming these blocks into separate chiplets, and subsequently reassembling them at the packaging level. The ultimate goal of a chiplet-based processor is to maintain or enhance performance while reducing the overall production expenses compared to traditional monolithic ICs.

The efficacy of chiplet design heavily relies on packaging techniques, particularly those employed to interconnect multiple chiplets, as they significantly impact the overall system performance. These advanced semiconductor packaging technologies, encompassing approaches like 2.5D IC, 3D IC, and high-density fan-out wafer-level packaging, are collectively referred to as “advanced semiconductor packaging.” They facilitate the convergence of multiple chiplets, often produced at different process nodes, onto a single substrate. This convergence is made possible by the utilization of compact bump sizes, thereby enabling heightened interconnect densities and superior integration capabilities.

Looking at the current landscape of advanced semiconductor packaging technologies in the industry, let’s take the server CPU sector as an illustrative example. While most contemporary server CPUs are built around monolithic System-on-Chip (SoC) designs, notable developments have emerged. In 2021, Intel announced its forthcoming server CPU, Sapphire Rapids, which will adopt a novel approach. This next-gen CPU will be constructed as a four-chip module interconnected via Intel’s Embedded Multi-die Interconnect Bridge (EMIB), representing a 2.5D advanced semiconductor packaging solution.

Concurrently, AMD has embraced the power of 3D advanced semiconductor packaging techniques to enhance server CPU performance. In the case of its latest server CPU, Milan-X (released in March 2022), AMD employs a 3D packaging strategy that involves stacking a cache die directly atop the processor. This innovation results in a remarkable >200x interconnect density boost compared to conventional 2D packaging, according to AMD’s claims. These developments are not limited to CPUs alone; the data center domain has also witnessed the integration of advanced semiconductor packaging technologies for other components, such as accelerators. NVIDIA, a key player, has been utilizing TSMC’s 2.5D packaging technology known as Chip on Wafer on Substrate (CoWoS) for its high-end GPU accelerators since 2016.

This surge of adoption, exemplified by both Intel and AMD in their cutting-edge products, points to a widening utilization of advanced semiconductor packaging technologies across the industry. The trend spans beyond server CPUs, encompassing a variety of data center components. As the industry landscape evolves, these innovative packaging methodologies are poised to play a pivotal role in enhancing performance, integration, and efficiency.

In the foreseeable future (over a span of 10-15 years), driven by escalating processing requirements and the necessity for substantial bandwidth at minimal power consumption, the automotive sector will emulate a trajectory akin to that of the cloud and high-performance computing (HPC) market. This trajectory involves the integration of diverse intellectual property (IP) and silicon elements at the package level to attain essential features and optimal performance. In the context of autonomous vehicle (AV) computing processors, the packaging landscape will witness the amalgamation of multiple silicon components within the same package, implementing advanced 2.5D and 3D design approaches.

With increasing demand for high performance computing in vehicles and a necessity for continual performance growth, there will be a rapid evolution in the technology that goes into automotive computers. Sub 3nm node sizes, chiplet designs, increased reliance on AI acceleration, 2.5D packaging, and even 3D packaging will all become a normal part of HPC for autonomous technologies in cars. Computers have been in cars for decades, but the technologies coming down the line will make an average car today look like landline technology in a world of smartphones.