UTCS Artificial Intelligence
courses
talks/events
demos
people
projects
publications
software/data
labs
areas
admin
Learning a Part-of-Speech Tagger from Two Hours of Annotation (2013)
Dan Garrette
, Jason Baldridge
Most work on weakly-supervised learning for part-of-speech taggers has been based on unrealistic assumptions about the amount and quality of training data. For this paper, we attempt to create true low-resource scenarios by allowing a linguist just two hours to annotate data and evaluating on the languages Kinyarwanda and Malagasy. Given these severely limited amounts of either type supervision (tag dictionaries) or token supervision (labeled sentences), we are able to dramatically improve the learning of a hidden Markov model through our method of automatically generalizing the annotations, reducing noise, and inducing word-tag frequency information.
View:
PDF
Citation:
Proceedings of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT-13)
(2013), pp. 138--147.
Bibtex:
@article{garrette:naacl13, title={Learning a Part-of-Speech Tagger from Two Hours of Annotation}, author={Dan Garrette and Jason Baldridge }, booktitle={Proceedings of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT-13)}, month={June}, address={Atlanta, GA}, pages={138--147}, url="http://www.cs.utexas.edu/users/ai-lab?garrette:naacl13", year={2013} }
Presentation:
Slides (PDF)
Video
People
Dan Garrette
Ph.D. Alumni
dhg [at] cs utexas edu
Areas of Interest
Machine Learning
Natural Language Processing
Semi-Supervised Learning
Labs
Machine Learning