UTCS Artificial Intelligence
courses
talks/events
demos
people
projects
publications
software/data
labs
areas
admin
Semi-supervised Clustering by Seeding (2002)
Sugato Basu
, Arindam Banerjee, and
Raymond J. Mooney
Semi-supervised clustering uses a small amount of labeled data to aid and bias the clustering of unlabeled data. This paper explores the use of labeled data to generate initial seed clusters, as well as the use of constraints generated from labeled data to guide the clustering process. It introduces two semi-supervised variants of KMeans clustering that can be viewed as instances of the EM algorithm, where labeled data provides prior information about the conditional distributions of hidden category labels. Experimental results demonstrate the advantages of these methods over standard random seeding and COP-KMeans, a previously developed semi-supervised clustering algorithm.
View:
PDF
,
PS
Citation:
In
Proceedings of 19th International Conference on Machine Learning (ICML-2002)
, pp. 19-26 2002.
Bibtex:
@InProceedings{basu:ml02, title={Semi-supervised Clustering by Seeding}, author={Sugato Basu and Arindam Banerjee and Raymond J. Mooney}, booktitle={Proceedings of 19th International Conference on Machine Learning (ICML-2002)}, pages={19-26}, url="http://www.cs.utexas.edu/users/ai-lab?basu:ml02", year={2002} }
People
Sugato Basu
Ph.D. Alumni
sugato [at] cs utexas edu
Raymond J. Mooney
Faculty
mooney [at] cs utexas edu
Areas of Interest
Machine Learning
Semi-Supervised Learning
Text Categorization and Clustering
Labs
Machine Learning