Computer Eyesight Gets a Lot More Accurate

AUGUST 18, 2014 8:01 PM

Just as the Big Bad Wolf promised Little Red Riding Hood that his bigger eyes were “the better to see you with,” a machine’s ability to see the world around it is benefiting from bigger computers and more accurate mathematical calculations.

The improvement was visible in contest results released Monday evening by computer scientists and companies that sponsor an annual challenge to measure improvements in the state of machine vision technology.

Started in 2010 by Stanford, Princeton and Columbia University scientists, the Large Scale Visual Recognition Challenge this year drew 38 entrants from 13 countries. The groups use advanced software, in most cases modeled loosely on the biological vision systems, to detect, locate and classify a huge set of images taken from Internet sources like Twitter. The contest was sponsored this year by Google, Stanford, Facebook and the University of North Carolina.

Contestants run their recognition programs on high-performance computers based in many cases on specialized processors called G.P.U.s, for graphic processing units.

This year there were six categories based on object detection, locating objects and classifying them. Winners included the National University of Singapore, the Oxford University, Adobe Systems, the Center for Intelligent Perception and Computing at the Chinese Academy of Sciences, as well as Google in two separate categories.

Accuracy almost doubled in the 2014 competition and error rates were cut in half, according to the conference organizers.

“This year is really what I consider a historical year for the challenge,” said Fei-Fei Li, the director of the Stanford Artificial Intelligence Laboratory and one of the creators of a vast set of labeled digital images that is the basis for the contest. “What really excites us is that performance has taken a huge leap.”

Despite the fact that contest is based on pattern recognition software that can be “trained” to recognize objects in digital images, the contest itself is made possible by the Imagenet database, an immense collection of more than 14 million images that have been identified by humans. The Imagenet database is publicly available to researchers at http://image-net.org/.

In the five years that the contest has been held, the organizers have twice, once in 2012 and again this year, seen striking improvements in accuracy, accompanied by more sophisticated algorithms and larger and faster computers.

In 2012 the contest was won by Geoffrey E. Hinton, a cognitive scientist at the University of Toronto, and two of his students. Mr. Hinton is a pioneer in the field of artificial neural networks, and in 2013 he joined Google with his students Alex Krizhevsky and Ilya Sutskever.

This year the entrants had the option of either disclosing the details of their algorithms or keeping them proprietary, and all of the winning groups chose to share details of their technical innovations. That was significant, according to Dr. Li, because it is possible to move quickly from research to commercial applications.

Machine vision has countless applications, including computer gaming, medical diagnosis, factory robotics and automotive safety systems. Recently a number of carmakers have added the ability to recognize pedestrians and bicyclists and stop automatically without driver intervention.

“We see innovation and creativity exploding,” she said. “The algorithms are more complex and they are just more interesting.”

This year almost all of the entrants used a variant of an approach known as a convolutional neural network, an approach first refined in 1998 by Yann LeCun, a French computer scientist who recently became director of artificial intelligence research at Facebook.

“This is LeCun’s hour,” said Gary Bradski, an artificial intelligence researcher who was the founder of Open CV, a widely used machine vision library of software tools. Convolutional neural networks have only recently begun to have impact because of the sharply falling cost of computing, he said, “In the past there were a lot of things people didn’t do because no one realized there would be so much inexpensive computing power available.”

The accuracy results this year improved to 43.9 percent, from 22.5 percent, and the error rate fell to 6.6 percent, from 11.7 percent, according to Olga Russakovsky, a Stanford University graduate researcher who is the lead organizer for the contest. Since the Imagenet Challenge began in 2010, the classification error rate has decreased fourfold, she said.

Despite the increases in computer vision accuracy, the systems still cannot match human vision, according to the researchers.

“Human-level understanding is much deeper than machine image classification,” she said. “I can easily find a image that will fool the algorithm and I can’t do it with humans, but we’re making significant progress.”

Although machines have made great progress in object recognition, they are only taking baby steps in what scientists describe as “scene understanding,” the ability to comprehend what is happening in an image in human language.

“I really believe in the phrase that ‘a picture is worth a thousand words,’ not a thousand disconnected words,” said Dr. Li. ”It’s the ability to tell a complete story. That is the holy grail.”

Advertisements

Comments are closed.

%d bloggers like this: