UTCS Artificial Intelligence
courses
talks/events
demos
people
projects
publications
software/data
labs
areas
admin
Integrating Language and Vision to Generate Natural Language Descriptions of Videos in the Wild (2014)
Jesse Thomason
,
Subhashini Venugopalan
, Sergio Guadarrama, Kate Saenko, and
Raymond Mooney
This paper integrates techniques in natural language processing and computer vision to improve recognition and description of entities and activities in real-world videos. We propose a strategy for generating textual descriptions of videos by using a factor graph to combine visual detections with language statistics. We use state-of-the-art visual recognition systems to obtain confidences on entities, activities, and scenes present in the video. Our factor graph model combines these detection confidences with probabilistic knowledge mined from text corpora to estimate the most likely subject, verb, object, and place. Results on YouTube videos show that our approach improves both the joint detection of these latent, diverse sentence components and the detection of some individual components when compared to using the vision system alone, as well as over a previous n-gram language-modeling approach. The joint detection allows us to automatically generate more accurate, richer sentential descriptions of videos with a wide array of possible content.
View:
PDF
Citation:
In
Proceedings of the 25th International Conference on Computational Linguistics (COLING 2014)
, pp. 1218--1227, Dublin, Ireland, August 2014.
Bibtex:
@inproceedings{thomason:coling14, title={Integrating Language and Vision to Generate Natural Language Descriptions of Videos in the Wild}, author={Jesse Thomason and Subhashini Venugopalan and Sergio Guadarrama and Kate Saenko and Raymond Mooney}, booktitle={Proceedings of the 25th International Conference on Computational Linguistics (COLING 2014)}, month={August}, address={Dublin, Ireland}, pages={1218--1227}, url="http://www.cs.utexas.edu/users/ai-labpub-view.php?PubID=127457", year={2014} }
People
Raymond J. Mooney
Faculty
mooney [at] cs utexas edu
Jesse Thomason
Ph.D. Alumni
thomason DOT jesse AT gmail
Subhashini Venugopalan
Ph.D. Alumni
vsub [at] cs utexas edu
Areas of Interest
Language and Vision
Natural Language Processing
Labs
Machine Learning