Experiments on Ensembles with Missing and Noisy Data

Experiments on Ensembles with Missing and Noisy Data (2004)

Prem Melville, Nishit Shah, Lilyana Mihalkova, and Raymond J. Mooney

One of the potential advantages of multiple classifier systems is an increased robustness to noise and other imperfections in data. Previous experiments on classification noise have shown that bagging is fairly robust but that boosting is quite sensitive. DECORATE is a recently introduced ensemble method that constructs diverse committees using artificial data. It has been shown to generally outperform both boosting and bagging when training data is limited. This paper compares the sensitivity of bagging, boosting, and DECORATE to three types of imperfect data: missing features, classification noise, and feature noise. For missing data, DECORATE is the most robust. For classification noise, bagging and DECORATE are both robust, with bagging being slightly better than DECORATE, while boosting is quite sensitive. For feature noise, all of the ensemble methods increase the resilience of the base classifier.

View:

PDF, PS

Citation:

In {Lecture Notes in Computer Science:} Proceedings of the Fifth International Workshop on Multi Classifier Systems (MCS-2004), F. Roli, J. Kittler, and T. Windeatt (Eds.), Vol. 3077, pp. 293-302, Cagliari, Italy, June 2004. Springer Verlag.

Bibtex:

People

Prem Melville	Ph.D. Alumni	pmelvi [at] us ibm com
Lilyana Mihalkova	Ph.D. Alumni	lilymihal [at] gmail com
Raymond J. Mooney	Faculty	mooney [at] cs utexas edu

Areas of Interest

Ensemble Learning Inductive Learning Machine Learning

Labs

Machine Learning