One engineer is trying to teach computers to watch TV — and to recognize what the computers are seeing.

Ben Taskar, the Magerman Term Assistant Professor in the Department of Computer and Information Science, is programming computers to recognize video content — such as plot, characters and even setting. This information will then make tagging and searching for videos online much simpler.

Using algorithms combining video, sound and text streams, Taskar and his team of graduate and undergraduate students are attempting to make computers associate the content of a video clip with existing descriptions of characters and actions. The computer can then draw conclusions from what it sees and categorize new data accordingly.

Taskar’s research focuses on machine learning, which is the study of how to make computers understand data, thereby making them more intelligent.

Currently, to categorize videos, photos and other electronic media, videos are assigned tags describing the contents of an image by television and movie fans, to make the video searchable on the internet. Even new “self-tagging” technologies have to rely on existing labels created by humans to tag new media.

Computers therefore have to be taught from scratch how to associate certain images, actions and sounds with their corresponding descriptions.

“It’s like a baby learning who and what the people and objects are in the world,” Taskar said.

Likening the process to a baby who is flooded with images as he grows up, Taskar explained how this baby learns what these images are by people pointing out and naming what he is seeing.

“But babies don’t learn that a kitty is a kitty from just seeing one once,” Taskar explained. “The baby needs to have several kitties pointed out to him before he starts associating the furry animal he is seeing with the name ‘kitty.’”

Using shows such as CSI, Alias and Lost, a computer is thus given specialized algorithms to be able to combine descriptive information with the video and decipher which person is which character, what exactly each character is doing and with whom.

Once the algorithm is perfected, the computer can then take new material and use its past learning to collect more knowledge — just like a baby growing up and learning more about its world.

“The real challenge, therefore, is to get computers to make correspondences from concurrences, conclusions from observations,” Taskar said.

The ultimate achievement of this research would be a computer that could use its artificial intelligence to perform a search for any scene or action in a television show or movie clip based on a one- or two-word description search, and even link them to similar scenes and events.

This would then eliminate the need for human input into search engines and databases.

Takar’s research so far has helped his team program computers to name characters in online video clips, as well as find actions that are repeated throughout the clip.

“There’s always more to do,” Taskar said, “but the future looks really promising right now.”

Comments powered by Disqus

Please note All comments are eligible for publication in The Daily Pennsylvanian.