About "spkmeans"

-- A text data Clustering tool

The program:

Features:

spkmeans

"spkmeans" can also reduce the dimensionality of the the original word-document matrix through concept decomposition or QR decompostion of the concept vectors. This may be useful for classification and query retrieval.

Input: Word-document matrix in CCS format. (NOT the documents themselves!!! See how to get the matrix from documents here.)
Output: The clusters of both documents and words as well as matrices after dimension reduction.

Example (for just clustering):

classic3_tfn_doctoclus.3

classic3_tfn_wordtoclus.3

Code:

code

GNU Public License (GPL)

README

here

Citation:

You are welcome to use the code under the terms of the licence for research or commercial purposes, however please acknowledge its use with a citation:
Dhillon, I. S. and Modha, D. M., "Concept Decompositions for Large Sparse Text Data using Clustering", Machine Learning, 42:1, pages 143-175, Jan, 2001.
Dhillon, I. S. and Fan, J. and Guan, Y.,   "Efficient Clustering of Very Large Document Collections", 2000, invited book chapter in Data Mining for Scientific and Engineering Applications, Kluwer Academic Publishers, 2001.
Here is a BiBTeX entry:
@ARTICLE{dhillon:modha:mlj01,
      AUTHOR = {Dhillon, I. S. and Modha, D. S.},
      TITLE = { Concept decompositions for large sparse text data using clustering},
      JOURNAL = {Machine Learning},
      YEAR = {2001},
      MONTH = {Jan},
      VOLUME = {42},
      NUMBER = {1},
      PAGES = {143--175} }
@INCOLLECTION{dhillon:fan:guan00,
      AUTHOR = {Dhillon, I. S. and Fan, J. and Guan, Y.},
      TITLE = {Efficient Clustering of Very Large Document Collections},
      BOOKTITLE = {Data Mining for Scientific and Engineering Applications},
      PUBLISHER = {Kluwer Academic Publishers},
      EDITOR = {R. Grossman, C. Kamath, V. Kumar and R. Namburu},
      YEAR = {2001},
      PAGES = {},
      NOTE = {Invited book chapter}}

Contact me:

Bug reports and comments are appreciated!