UTCS Artificial Intelligence
courses
talks/events
demos
people
projects
publications
software/data
labs
areas
admin
Multimodal Contextualized Semantic Parsing from Speech (2024)
Jordan Voas
,
Raymond Mooney
, David Harwath
We introduce Semantic Parsing in Contextual Environments (SPICE), a task designed to enhance artificial agents’ contextual awareness by integrating multimodal inputs with prior contexts. SPICE goes beyond traditional semantic parsing by offering a structured, interpretable framework for dynamically updating an agent’s knowledge with new information, mirroring the complexity of human communication. We develop the VG-SPICE dataset, crafted to challenge agents with visual scene graph construction from spoken conversational exchanges, highlighting speech and visual data integration. We also present the Audio-Vision Dialogue Scene Parser (AViD-SP) developed for use on VG-SPICE. These innovations aim to improve multimodal information processing and integration. Both the VG-SPICE dataset and the AViD-SP model are publicly available.
View:
PDF
,
Arxiv
Citation:
Association for Computational Linguistics (ACL)
(2024).
Bibtex:
@article{voas:acl24, title={Multimodal Contextualized Semantic Parsing from Speech}, author={Jordan Voas and Raymond Mooney and David Harwath}, booktitle={Association for Computational Linguistics (ACL)}, month={August}, url="http://www.cs.utexas.edu/users/ai-labpub-view.php?PubID=128061", year={2024} }
Presentation:
Slides (PDF)
Poster
Video
People
Raymond J. Mooney
Faculty
mooney [at] cs utexas edu
Jordan Voas
Ph.D. Student
jvoas [at] utexas edu
Areas of Interest
Connecting Language and Perception
Deep Learning
Language and Vision
Learning for Semantic Parsing
Speech
Labs
Machine Learning