UTCS Artificial Intelligence
courses
talks/events
demos
people
projects
publications
software/data
labs
areas
admin
A Multimodal LDA Model Integrating Textual, Cognitive and Visual Modalities (2013)
Stephen Roller
and Sabine Schulte im Walde
Recent investigations into grounded models of language have shown that holistic views of language and perception can provide higher performance than independent views. In this work, we improve a two-dimensional multimodal version of Latent Dirichlet Allocation (Andrews et al., 2009) in various ways. (1) We outperform text-only models in two different evaluations, and demonstrate that low-level visual features are directly compatible with the existing model. (2) We present a novel way to integrate visual features into the LDA model using unsupervised clusters of images. The clusters are directly interpretable and improve on our evaluation tasks. (3) We provide two novel ways to extend the bimodal models to support three or more modalities. We find that the three-, four-, and five-dimensional models significantly outperform models using only one or two modalities, and that nontextual modalities each provide separate, disjoint knowledge that cannot be forced into a shared, latent structure.
View:
PDF
Citation:
In
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (EMNLP 2013)
, pp. 1146--1157, Seattle, WA, October 2013.
Bibtex:
@inproceedings{roller:emnlp13, title={A Multimodal LDA Model Integrating Textual, Cognitive and Visual Modalities}, author={Stephen Roller and Sabine Schulte im Walde}, booktitle={Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (EMNLP 2013)}, month={October}, address={Seattle, WA}, pages={1146--1157}, url="http://www.cs.utexas.edu/users/ai-labpub-view.php?PubID=127403", year={2013} }
People
Stephen Roller
Ph.D. Alumni
roller [at] cs utexas edu
Areas of Interest
Language and Vision
Lexical Semantics
Natural Language Processing
Labs
Machine Learning