UTCS Artificial Intelligence
courses
talks/events
demos
people
projects
publications
software/data
labs
areas
admin
Collecting Highly Parallel Data for Paraphrase Evaluation (2011)
David L. Chen
and William B. Dolan
A lack of standard datasets and evaluation metrics has prevented the field of paraphrasing from making the kind of rapid progress enjoyed by the machine translation community over the last 15 years. We address both problems by presenting a novel data collection framework that produces highly parallel text data relatively inexpensively and on a large scale. The highly parallel nature of this data allows us to use simple n-gram comparisons to measure both the semantic adequacy and lexical dissimilarity of paraphrase candidates. In addition to being simple and efficient to compute, experiments show that these metrics correlate highly with human judgments.
View:
PDF
Citation:
In
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics
, pp. 190-200, Portland, Oregon, USA, June 2011.
Bibtex:
@inproceedings{chen:acl11, title={Collecting Highly Parallel Data for Paraphrase Evaluation}, author={David L. Chen and William B. Dolan}, booktitle={Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics}, month={June}, address={Portland, Oregon, USA}, pages={190-200}, url="http://www.cs.utexas.edu/users/ai-lab?chen:acl11", year={2011} }
Presentation:
Slides (PPT)
People
David Chen
Ph.D. Alumni
cooldc [at] hotmail com
Areas of Interest
Natural Language Processing
Labs
Machine Learning