UTCS Artificial Intelligence
courses
talks/events
demos
people
projects
publications
software/data
labs
areas
admin
Building a Persistent Workforce on Mechanical Turk for Multilingual Data Collection (2011)
David L. Chen
and William B. Dolan
Traditional methods of collecting translation and paraphrase data are prohibitively expensive, making the construction of large, new corpora difficult. While crowdsourcing offers a cheap alternative, quality control and scalability can become problematic. We discuss a novel annotation task that uses videos as the stimulus which discourages cheating. In addi- tion, our approach requires only monolingual speakers, thus making it easier to scale since more workers are qualified to contribute. Finally, we employ a multi-tiered payment system that helps retain good workers over the long-term, resulting in a persistent, high-quality workforce. We present the results of one of the largest linguistic data collection efforts to date using Mechanical Turk, yielding 85K English sentences and more than 1k sentences for each of a dozen more languages.
View:
PDF
Citation:
In
Proceedings of The 3rd Human Computation Workshop (HCOMP 2011)
, August 2011.
Bibtex:
@inproceedings{che.hcomp11, title={Building a Persistent Workforce on Mechanical Turk for Multilingual Data Collection}, author={David L. Chen and William B. Dolan}, booktitle={Proceedings of The 3rd Human Computation Workshop (HCOMP 2011)}, month={August}, url="http://www.cs.utexas.edu/users/ai-lab?che:hcomp11", year={2011} }
Presentation:
Slides (PPT)
People
David Chen
Ph.D. Alumni
cooldc [at] hotmail com
Areas of Interest
Natural Language Processing
Labs
Machine Learning