June 20, 2016
Computers only recently began to get the software needed to discern unknown objects; now machine-learning takes computer vision to the next level with a system that can describe objects and put them into context. Coming soon, better visual search?
Computer software only recently became smart enough to recognize objects in photographs. Now, Stanford researchers using machine learning have created a system that takes the next step, writing a simple story of what’s happening in any digital image.
“The system can analyze an unknown image and explain it in words and phrases that make sense,” said Fei-Fei Li, a professor of computer science and director of the Stanford Artificial Intelligence Lab.
“This is an important milestone,” Li said. “It’s the first time we’ve had a computer vision system that could tell a basic story about an unknown image by identifying discrete objects and also putting them into some context.”
Humans, Li said, create mental stories that put what we see into context. “Telling a story about a picture turns out to be a core element of human visual intelligence but so far it has proven very difficult to do this with computer algorithms,” she said.
At the heart of the Stanford system are algorithms that enable the system to improve its accuracy by scanning scene after scene, looking for patterns, and then using the accumulation of previously described scenes to extrapolate what is being depicted in the next unknown image.