UTCS Artificial Intelligence
courses
talks/events
demos
people
projects
publications
software/data
labs
areas
admin
A Probabilistic Framework for Semi-Supervised Clustering (2004)
Sugato Basu
,
Mikhail Bilenko
, and
Raymond J. Mooney
Unsupervised clustering can be significantly improved using supervision in the form of pairwise constraints, i.e., pairs of instances labeled as belonging to same or different clusters. In recent years, a number of algorithms have been proposed for enhancing clustering quality by employing such supervision. Such methods use the constraints to either modify the objective function, or to learn the distance measure. We propose a probabilistic model for semi-supervised clustering based on Hidden Markov Random Fields (HMRFs) that provides a principled framework for incorporating supervision into prototype-based clustering. The model generalizes a previous approach that combines constraints and Euclidean distance learning, and allows the use of a broad range of clustering distortion measures, including Bregman divergences (e.g., Euclidean distance and I-divergence) and directional similarity measures (e.g., cosine similarity). We present an algorithm that performs partitional semi-supervised clustering of data by minimizing an objective function derived from the posterior energy of the HMRF model. Experimental results on several text data sets demonstrate the advantages of the proposed framework.
View:
PDF
,
PS
Citation:
In
Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2004)
, pp. 59-68, Seattle, WA, August 2004.
Bibtex:
@InProceedings{basu:kdd04, title={A Probabilistic Framework for Semi-Supervised Clustering}, author={Sugato Basu and Mikhail Bilenko and Raymond J. Mooney}, booktitle={Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2004)}, month={August}, address={Seattle, WA}, pages={59-68}, url="http://www.cs.utexas.edu/users/ai-lab?basu:kdd04", year={2004} }
People
Sugato Basu
Ph.D. Alumni
sugato [at] cs utexas edu
Mikhail Bilenko
Ph.D. Alumni
mbilenko [at] microsoft com
Raymond J. Mooney
Faculty
mooney [at] cs utexas edu
Areas of Interest
Machine Learning
Semi-Supervised Learning
Text Categorization and Clustering
Labs
Machine Learning