UTCS Artificial Intelligence
courses
talks/events
demos
people
projects
publications
software/data
labs
areas
admin
Watch, Listen & Learn: Co-training on Captioned Images and Videos (2008)
Sonal Gupta
,
Joohyun Kim
,
Kristen Grauman
and
Raymond Mooney
Recognizing visual scenes and activities is challenging: often visual cues alone are ambiguous, and it is expensive to obtain manually labeled examples from which to learn. To cope with these constraints, we propose to leverage the text that often accompanies visual data to learn robust models of scenes and actions from partially labeled collections. Our approach uses co-training, a semi-supervised learning method that accommodates multi-modal views of data. To classify images, our method learns from captioned images of natural scenes; and to recognize human actions, it learns from videos of athletic events with commentary. We show that by exploiting both multi-modal representations and unlabeled data our approach learns more accurate image and video classifiers than standard baseline algorithms.
View:
PDF
Citation:
In
Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD)
, pp. 457--472, Antwerp Belgium, September 2008.
Bibtex:
@inproceedings{gupta:ecml-pkdd08, title={Watch, Listen & Learn: Co-training on Captioned Images and Videos}, author={Sonal Gupta and Joohyun Kim and Kristen Grauman and Raymond Mooney}, booktitle={Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD)}, month={September}, address={Antwerp Belgium}, pages={457--472}, url="http://www.cs.utexas.edu/users/ai-lab?gupta:ecml-pkdd08", year={2008} }
People
Kristen Grauman
Faculty
grauman [at] cs utexas edu
Sonal Gupta
Masters Alumni
sonal [at] cs stanford edu
Joohyun Kim
Ph.D. Alumni
scimitar [at] cs utexas edu
Raymond J. Mooney
Faculty
mooney [at] cs utexas edu
Areas of Interest
Computer Vision
Language and Vision
Machine Learning
Semi-Supervised Learning
Labs
Machine Learning
Computer Vision