UTCS Artificial Intelligence
courses
talks/events
demos
people
projects
publications
software/data
labs
areas
admin
What is the Best Automated Metric for Text to Motion Generation? (2023)
Jordan Voas
There is growing interest in generating skeleton-based human motions from natural language descriptions. While most efforts have focused on developing better neural architectures for this task, there has been no significant work on determining the proper evaluation metric. Human evaluation is the ultimate accuracy measure for this task, and automated metrics should correlate well with human quality judgments. Since descriptions are compatible with many motions, determining the right metric is critical for evaluating and designing meaningful training losses for supervising generative models. This paper systematically studies which metrics best align with human evaluations and proposes new metrics that align even better. Our findings indicate that none of the metrics currently used for this task show even a moderate correlation with human judgments on a sample level. However, for assessing average model performance, commonly used metrics such as R-Precision and rarely used coordinate errors show strong correlations. Several recently developed metrics are not recommended due to their low correlation compared to alternatives. Additionally, multiple novel metrics which exhibiting improved correlation and potential for future use.
View:
PDF
Citation:
Masters Thesis, Department of Computer Science, UT Austin.
Bibtex:
@mastersthesis{voas:msthesis23, title={What is the Best Automated Metric for Text to Motion Generation?}, author={Jordan Voas}, month={May}, school={Department of Computer Science, UT Austin}, address={Austin, TX}, url="http://www.cs.utexas.edu/users/ai-labpub-view.php?PubID=128040", year={2023} }
People
Jordan Voas
Ph.D. Student
jvoas [at] utexas edu
Areas of Interest
Connecting Language and Perception
Deep Learning
Labs
Machine Learning