UTCS Artificial Intelligence
courses
talks/events
demos
people
projects
publications
software/data
labs
areas
admin
Measuring Sound Symbolism in Audio-visual Models (2024)
Wei-Cheng Tseng, Yi-Jen Shih, David Harwath,
Raymond Mooney
Audio-visual pre-trained models have gained substantial attention recently and demonstrated superior performance on various audio-visual tasks. This study investigates whether pre-trained audio-visual models demonstrate non-arbitrary associations between sounds and visual representations–known as sound symbolism–which is also observed in humans. We developed a specialized dataset with synthesized images and audio samples and assessed these models using a non-parametric approach in a zero-shot setting. Our findings reveal a significant correlation between the models' outputs and established patterns of sound symbolism, particularly in models trained on speech data. These results suggest that such models can capture sound-meaning connections akin to human language processing, providing insights into both cognitive architectures and machine learning strategies.
View:
PDF
,
Arxiv
Citation:
IEEE Spoken Language Technology (SLT) Workshop
(2024).
Bibtex:
@article{tseng:slt24, title={Measuring Sound Symbolism in Audio-visual Models}, author={Wei-Cheng Tseng and Yi-Jen Shih and David Harwath and Raymond Mooney}, booktitle={IEEE Spoken Language Technology (SLT) Workshop}, month={December}, url="http://www.cs.utexas.edu/users/ai-labpub-view.php?PubID=128105", year={2024} }
People
Raymond J. Mooney
Faculty
mooney [at] cs utexas edu
Areas of Interest
Connecting Language and Perception
Deep Learning
Speech
Labs
Machine Learning