Ensembling methods are well known in machine learning for improving prediction accuracy.
However, they are limited in the sense that they cannot effectively discriminate among
underlying component models. Some models perform better at certain types of input instances
than other models. The measure of how good a model is can sometimes be gauged from
"where" it extracted the output and "why" it made the prediction. This information can be exploited
to leverage the component models in an ensemble. In this proposal, we present stacking
with auxiliary features that integrates relevant information from multiple sources to improve
ensembling. We use two types of auxiliary features - instance features and provenance features.
The instance features enable the stacker to discriminate across input instances while the
provenance features enable the stacker to discriminate across component systems. When combined
together, our algorithm learns to rely on systems that not just agree on an output but also
the provenance of this output in conjunction with the input instance type.
We demonstrate our approach on three very different and difficult problems: Cold Start
Slot Filling, Tri-lingual Entity Discovery and Linking, and ImageNet Object Detection. The
first two problems are well known tasks in Natural Language Processing, and the third one is
in the domain of Computer Vision. Our algorithm obtains state-of-the-art results on the first
two tasks and significant improvements on the ImageNet task, thus verifying the power and
generality of our approach. We also present a novel approach using stacking for combining
systems that do not have training data in an unsupervised ensemble with systems that do have
training data. Our combined approach achieves state-of-the-art on the Cold Start Slot Filling
and Tri-lingual Entity Discovery and Linking tasks, beating our own prior performance on
ensembling just the supervised systems.
We propose several short-term and long-term extensions to our work. In the short-term, we
focus our work on using more semantic instance-level features for all the three tasks, and use
non-lexical features that are language independent for the two NLP tasks. In the long-term we
propose to demonstrate our ensembling algorithm on the Visual Question Answering task and
use textual/visual explanations as auxiliary features to stacking.
PhD proposal, Department of Computer Science, The University of Texas at Austin.