Extra “eye” movements are the key to better self-driving cars

Andrea Benucci and colleagues at the RIKEN Center for Brain Science has developed a way to create artificial neural networks that learn to recognize objects faster and more accurately. The study, recently published in the scientific journal PLOS Computational Biology, focuses on all the unnoticed eye movements that we make, and shows that they serve a vital purpose in allowing us to stably recognize objects. These findings can be applied to machine vision, for example, making it easier for self-driving cars to learn how to recognize important features on the road.

Despite making constant head and eye movements throughout the day, objects in the world do not blur or become unrecognizable, even though the physical information hitting our retinas changes constantly. What likely make this perceptual stability possible are neural copies of the movement commands. These copies are sent throughout the brain each time we move and are thought to allow the brain to account for our own movements and keep our perception stable.

In addition to stable perception, evidence suggests that eye movements, and their motor copies, might also help us to stably recognize objects in the world, but how this happens remains a mystery. Benucci developed a convolutional neural network (CNN) that offers a solution to this problem. The CNN was designed to optimize the classification of objects in a visual scene while the eyes are moving.

First, the network was trained to classify 60,000 black and white images into 10 categories. Although it performed well on these images, when tested with shifted images that mimicked naturally altered visual input that would occur when the eyes move, performance dropped drastically to chance level. However, classification improved significantly after training the network with shifted images, as long as the direction and size of the eye movements that resulted in the shift were also included.

In particular, adding the eye movements and their motor copies to the network model allowed the system to better cope with visual noise in the images. “This advancement will help avoid dangerous mistakes in machine vision,” says Benucci. “With more efficient and robust machine vision, it is less likely that pixel alterations—also known as ‘adversarial attacks’—will cause, for example, self-driving cars to label a stop sign as a light pole, or military drones to misclassify a hospital building as an enemy target.”

Bringing these results to real world machine vision is not as difficult as it seems. As Benucci explains, “the benefits of mimicking eye movements and their efferent copies implies that ‘forcing’ a machine-vision sensor to have controlled types of movements, while informing the vision network in charge of processing the associated images about the self-generated movements, would make machine vision more robust, and akin to what is experienced in human vision.”

The next step in this research will involve collaboration with colleagues working with neuromorphic technologies. The idea is to implement actual silicon-based circuits based on the principles highlighted in this study and test whether they improve machine-vision capabilities in real world applications.