UTCS Artificial Intelligence
courses
talks/events
demos
people
projects
publications
software/data
labs
areas
admin
Hidden State Guidance: Improving Image Captioning using An Image Conditioned Autoencoder (2019)
Jialin Wu
and
Raymond J. Mooney
Most RNN-based image captioning models receive supervision on the output words to mimic human captions. Therefore, the hidden states can only receive noisy gradient signals via layers of back-propagation through time, leading to less accurate generated captions. Consequently, we propose a novel framework, Hidden State Guidance (HSG), that matches the hidden states in the caption decoder to those in a teacher decoder trained on an easier task of autoencoding the captions conditioned on the image. During training with the REINFORCE algorithm, the conventional rewards are sentence-based evaluation metrics equally distributed to each generated word, no matter their relevance. HSG provides a word-level reward that helps the model learn better hidden representations. Experimental results demonstrate that HSG clearly outperforms various state-of-the-art caption decoders using either raw images, detected objects, or scene graph features as inputs.
View:
PDF
,
Other
Citation:
In
Proceedings of the Visually Grounded Interaction and Language Workshop at NeurIPS 2019
, December 2019.
Bibtex:
@inproceedings{wu:vigil19, title={Hidden State Guidance: Improving Image Captioning using An Image Conditioned Autoencoder}, author={Jialin Wu and Raymond J. Mooney}, booktitle={Proceedings of the Visually Grounded Interaction and Language Workshop at NeurIPS 2019}, month={December}, url="http://www.cs.utexas.edu/users/ai-labpub-view.php?PubID=127776", year={2019} }
Presentation:
Poster
People
Raymond J. Mooney
Faculty
mooney [at] cs utexas edu
Jialin Wu
Ph.D. Alumni
jialinwu [at] utexas edu
Areas of Interest
Language and Vision
Labs
Machine Learning