Large-Scale Data Mining

CS 395T/CAM 395T

CS Unique No. 54320 / CAM Unique No. 65042

Fall 2004
TTh 3:30-5pm
Welch 3.260

Professor: Inderjit Dhillon (send email)
Office: ACES 2.332
Office Hours: Thurs 2-3pm

Handouts

  • Course Information (contains grading information) handed out on Aug 26.
  • Class Survey, Aug 26.
  • Sergey Brin, Rajeev Motwani, Lawrence Page, Terry Winograd, "What can you do with a Web in your Pocket?"
  • Amy N. Langville and Carl D. Meyer, "Deeper Inside PageRank"
  • Reference Textbooks

  • "Elements of Statistical Learning: Data Mining, Inference, and Prediction" by T. Hastie, R. Tibshirani, J. Friedman, Springer-Verlag, 2001.
  • "Pattern Classification" by R. Duda, P. Hart and D. Stork, John Wiley and Sons, November 2000.
  • Reading List for Student Presentations

    Class Projects

    Lecture Notes

  • Lecture 1 - Finding good "hubs" and "authorities" for broad-topic queries.
  • Lecture 2 - Review of basic linear algebra (vectors, norms, eigenvalues/eigenvectors, SVD), Proof that hub and authority vectors converge to the dominant singular vectors.
  • Lectures 3 & 4 - HITS, Clever Project, Google's PageRank.
  • Lectures 5 & 6 - Vector Space Model, Latent Semantic Indexing, SVD.
  • Lectures 7, 8 & 9 - PCA, Clustering, Hierarchical Agglomerative Clustering(HAC), k-means.
  • Lecture 10 - Information Theory, Clustering and Bregman Divergences.
  • Lecture 11 - Graph partitioning algorithms (Kernighan-Lin, Spectral Partitioning, Multilevel methods such as Metis.
  • Homeworks

  • Homework 1
  • Homework 2
  • Homework 3
  • Related Courses

  • Previous offering of CS 395T in Fall 2003.
  • Univ of Minnesota's CSci 8363, Linear Algebra in Data Exploration, Spring 2003.
  • Stanford's CS 349, Data Mining, Search, and the World Wide Web, Fall 1998.
  • Stanford's Data Mining Course, 2000.
  • UT Austin ECE course EE 380L, A Practicum in Data Mining, Fall 1999.