UTCS Artificial Intelligence
courses
talks/events
demos
people
projects
publications
software/data
labs
areas
admin
Model-based Overlapping Clustering (2005)
A. Banerjee, C. Krumpelman, S. Basu,
Raymond J. Mooney
and Joydeep Ghosh
While the vast majority of clustering algorithms are partitional, many real world datasets have inherently overlapping clusters. The recent explosion of analysis on biological datasets, which are frequently overlapping, has led to new clustering models that allow hard assignment of data points to multiple clusters. One particularly appealing model was proposed by Segal et al. in the context of probabilistic relational models (PRMs) applied to the analysis of gene microarray data. In this paper, we start with the basic approach of Segal et al. and provide an alternative interpretation of the model as a generalization of mixture models, which makes it easily interpretable. While the original model maximized likelihood over constant variance Gaussians, we generalize it to work with any regular exponential family distribution, and corresponding Bregman divergences, thereby making the model applicable for a wide variety of clustering distance functions, e.g., KL-divergence, Itakura-Saito distance, I-divergence. The general model is applicable to several domains, including high-dimensional sparse domains, such as text and recommender systems. We additionally offer several algorithmic modifications that improve both the performance and applicability of the model. We demonstrate the effectiveness of our algorithm through experiments on synthetic data as well as subsets of 20-Newsgroups and EachMovie datasets.
View:
PDF
,
PS
Citation:
In
Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-05)
2005.
Bibtex:
@inproceedings{banerjee:kdd05, title={Model-based Overlapping Clustering}, author={A. Banerjee and C. Krumpelman and S. Basu and Raymond J. Mooney and Joydeep Ghosh}, booktitle={Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-05)}, url="http://www.cs.utexas.edu/users/ai-lab?banerjee:kdd05", year={2005} }
People
Sugato Basu
Ph.D. Alumni
sugato [at] cs utexas edu
Raymond J. Mooney
Faculty
mooney [at] cs utexas edu
Areas of Interest
Machine Learning
Text Categorization and Clustering
Unsupervised Learning, Clustering, and Self-Organization
Labs
Machine Learning