CS 395T- Large-scale Data Mining

Homework 4

Clustering

    The main goal of this homework is to experiment with some clustering techniques.


    Answer the following questions:

    1. Report your clustering results using the 5 techniques on classic3, the 300 document set and cmu.news 20_cleaned. For each clustering, submit the confusion matrix and objective function value (if available).
    2. What is the number of clusters output by spmeans for each of the data sets? Is it 3 for the classic3 data?
    3. How did the clustering programs perform on your email (or other text collection)?
    4. Are your clustering results good? If not, explain why.
    5. In your opinion which of the clustering techniques is the best? Why?

Due date: Oct. 30, 2001