UTCS Artificial Intelligence
courses
talks/events
demos
people
projects
publications
software/data
labs
areas
admin
Type-Supervised Hidden Markov Models for Part-of-Speech Tagging with Incomplete Tag Dictionaries (2012)
Dan Garrette
and Jason Baldridge
Past work on learning part-of-speech taggers from tag dictionaries and raw data has reported good results, but the assumptions made about those dictionaries are often unrealistic: due to historical precedents, they assume access to information about labels in the raw and test sets. Here, we demonstrate ways to learn hidden Markov model taggers from incomplete tag dictionaries. Taking the MIN-GREEDY algorithm (Ravi et al., 2010) as a starting point, we improve it with several intuitive heuristics. We also define a simple HMM emission initialization that takes advantage of the tag dictionary and raw data to capture both the openness of a given tag and its estimated prevalence in the raw data. Altogether, our augmentations produce improvements to performance over the original MIN-GREEDY algorithm for both English and Italian data.
View:
PDF
Citation:
In
Proceedings of the Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL 2012)
, pp. 821--831, Jeju, Korea, July 2012.
Bibtex:
@inproceedings{garrette:emnlp12, title={Type-Supervised Hidden Markov Models for Part-of-Speech Tagging with Incomplete Tag Dictionaries}, author={Dan Garrette and Jason Baldridge}, booktitle={Proceedings of the Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL 2012)}, month={July}, address={Jeju, Korea}, pages={821--831}, url="http://www.cs.utexas.edu/users/ai-lab?garrette:emnlp12", year={2012} }
People
Dan Garrette
Ph.D. Alumni
dhg [at] cs utexas edu
Areas of Interest
Machine Learning
Natural Language Processing
Semi-Supervised Learning
Labs
Machine Learning