UTCS Artificial Intelligence
courses
talks/events
demos
people
projects
publications
software/data
labs
areas
admin
Self-Critical Reasoning for Robust Visual Question Answering (2019)
Jialin Wu
and
Raymond J. Mooney
Visual Question Answering (VQA) deep-learning systems tend to capture superficial statistical correlations in the training data because of strong language priors and fail to generalize to test data with a significantly different question-answer (QA) distribution [1]. To address this issue, we introduce a self-critical training objective that ensures that visual explanations of correct answers match the most influential image regions more than other competitive answer candidates. The influential regions are either determined from human visual/textual explanations or automatically from just significant words in the question and answer. We evaluate our approach on the VQA generalization task using the VQA-CP dataset, achieving a new state-of-the-art i.e., 49.5 % using textual explanations and 48.5 % using automatically annotated regions.
View:
PDF
Citation:
In
Proceedings of Neural Information Processing Systems (NeurIPS)
, December 2019.
Bibtex:
@inproceedings{wu:neurips19, title={Self-Critical Reasoning for Robust Visual Question Answering}, author={Jialin Wu and Raymond J. Mooney}, booktitle={Proceedings of Neural Information Processing Systems (NeurIPS) }, month={December}, url="http://www.cs.utexas.edu/users/ai-labpub-view.php?PubID=127777", year={2019} }
Presentation:
Slides (PDF)
Poster
People
Raymond J. Mooney
Faculty
mooney [at] cs utexas edu
Jialin Wu
Ph.D. Alumni
jialinwu [at] utexas edu
Areas of Interest
Explainable AI
Language and Vision
Labs
Machine Learning