Researchers at the University are working to perfect a computer algorithm designed to summarize first-person perspective films, with the hope of aiding the elderly and memory-impaired.
Kristen Grauman, a computer science associate professor and project leader, said her team produced an algorithm capable of analyzing long segments of video and creating short, storyboard summaries. The algorithm uses a combination of machine-learning technology and optimization to predict the important elements in a video and demonstrate how they are connected.
The story-driven video summarization technology is the first of its kind to focus on egocentric video, rather than on video shot from a stationary camera. Grauman said this perspective allows for greater applications of the summarizing feature and may be of use for memory-impaired individuals.
“If you think about who needs a first-person video summarized, you first think about life loggers, people who just do this for fun or for social media,” Gruman said. “But also what I would say are even more serious applications are clinical health or elder care kind of things where you need to monitor someone’s ability to do activities of daily living, or to help them recap or re-experience visual memories to help them jog [their] memory.”
Grauman said that while there are many uses for the technology, her team is focused on developing a more basic application.
“The data that we have now is more of a daily living kind of thing,” Grauman said. “If you talk about commercializing then you could think about specializing for specific application needs. We’re just focused on things we can come up with that would most likely be relevant for any kind of scenario.”
Grauman conducted her initial research with former postdoctoral research fellow Lu Zheng and former doctoral student Yong Jae Lee. Lee said his role focused on how to use machine-learning techniques in order to make the technology predict important objects.
“Since the video can be many hours, we aimed to find the frames that contain the most important people and objects,” Lee said. “In order to find these frames, we train an algorithm to predict important image regions using egocentric cues, like how often objects appear in the center of the frame.”
Zheng said he worked closely with Grauman in order to shape their ideas and formulate solutions to problems they encountered.
“I was mainly responsible for coming out with the research ideas, working on details of the algorithm, programming the algorithm and designing and performing experiments,” Zheng said. “Professor Grauman worked with [us] closely in various ways, such as shaping the initial idea and suggesting possible solutions to difficulties encountered along the way.”
Grauman and her team published several papers over the two years they spent working in this area of computer science, but she said the technology still has a lot of growing and evolving to do.
“We’re just getting better and looking at more technical problems that arise based on what we’ve done so far,” Grauman said. “You could certainly release some version of what we’re researching right now, but there’s just so much room for innovation and improvement.”