Mobileye’s Self-Driving Secret? 200PB of Data

Like all drivers, autonomous vehicles will face a “long tail” of problems in which a self-driving vehicle encounters something it has not seen or experienced before. An example would be a tractor covered in snow, as shown here. Mobileye’s state-of-the-art computer vision coupled with extremely capable natural language models allows for hard mining of Mobileye’s 200 petabytes of data, delivering thousands of results within seconds, even for extremely rare conditions and scenarios

Mobileye describes the computer vision tech and natural language models that use an industry-leading dataset for autonomous vehicle training.

What’s New: Mobileye is sitting on a virtual treasure trove of driving data – some 200 petabytes worth. When combined with Mobileye’s state-of-the-art computer vision technology and extremely capable natural language understanding (NLU) models, the dataset can deliver thousands of results within seconds, even for incidents that fall into the “long tail” of rare conditions and scenarios. This helps the AV and state-of-the-art computer vision system handle edge cases and thereby achieve the very high mean time between failure (MTBF) rate targeted for self-driving vehicles.

“Data and the infrastructure in place to harness it is the hidden complexity of autonomous driving. Mobileye has spent 25 years collecting and analyzing what we believe to be the industry’s leading database of real-world and simulated driving experience, setting Mobileye apart by enabling highly capable AV solutions that meet the high bar for mean time between failure.”
― Prof. Amnon Shashua, Mobileye president and chief executive officer

How It Works: Mobileye’s database – believed to be the world’s largest automotive dataset – comprises more than 200 petabytes of driving footage, equivalent to 16 million 1-minute driving clips from 25 years of real-world driving. Those 200 petabytes are stored between Amazon Web Services (AWS) and on-premise systems. The sheer size of Mobileye’s dataset makes the company one of AWS’s largest customers by volume stored globally.

Large-scale data labeling is at the heart of building powerful computer vision engines needed for autonomous driving. Mobileye’s rich and relevant dataset is annotated both automatically and manually by a team of more than 2,500 specialized annotators. The compute engine relies on 500,000 peak CPU cores at the AWS cloud to crunch 50 million datasets monthly – the equivalent to 100 petabytes being processed every month related to 500,000 hours of driving.

Why It Matters: Data is only valuable if you can make sense of it and put it to use. This requires deep comprehension of natural language along with state-of-the-art computer vision, Mobileye’s long-standing strength.

Every AV player faces the “long tail” problem in which a self-driving vehicle encounters something it has not seen or experienced before. This long tail contains large datasets, but many do not have the tools to effectively make sense of it. Mobileye’s state-of-the-art computer vision technology combined with extremely capable NLU models enable Mobileye to query the dataset and return thousands of results within the long tail within seconds. Mobileye can then use this to train its computer vision system and make it even more capable. Mobileye’s approach dramatically accelerates the development cycle.

What Is Included: Mobileye’s team uses an in-house search engine database with millions of images, video clips and scenarios. They include anything from “tractor covered in snow” to “traffic light in low sun,” all collected by Mobileye and feeding its algorithms. (See sample images).

More Context: With access to the industry’s highest-quality data and the talent required to put it to use, Mobileye’s driving policy can make sound, informed decisions deterministically, an approach that removes the uncertainty of artificial intelligence-based decisions and yields a statistically high mean time between failure rate. At the same time, the dataset hastens the development cycle to bring the lifesaving promise of AV technology to reality more quickly.