Weakly-Supervised Bayesian Learning of a CCG Supertagger

Weakly-Supervised Bayesian Learning of a CCG Supertagger (2014)

Dan Garrette, Chris Dyer, Jason Baldridge, and Noah A. Smith

We present a Bayesian formulation for weakly-supervised learning of a Combinatory Categorial Grammar (CCG) supertagger with an HMM. We assume supervision in the form of a tag dictionary, and our prior encourages the use of cross-linguistically common category structures as well as transitions between tags that can combine locally according to CCG's combinators. Our prior is theoretically appealing since it is motivated by language-independent, universal properties of the CCG formalism. Empirically, we show that it yields substantial improvements over previous work that used similar biases to initialize an EM-based learner. Additional gains are obtained by further shaping the prior with corpus-specific information that is extracted automatically from raw text and a tag dictionary.

View:

PDF

Citation:

In Proceedings of the Eighteenth Conference on Computational Natural Language Learning (CoNLL-2014), pp. 141--150, Baltimore, MD, June 2014.

Bibtex:

Presentation:

Slides (PDF) Poster

People

Dan Garrette

Ph.D. Alumni

dhg [at] cs utexas edu

Areas of Interest

Natural Language Processing Semi-Supervised Learning

Labs

Machine Learning