Large-Scale Data Mining
CS 395T
Unique Number: 49460
Course Announcement
Spring 2000
M-W 4:00-5:30pm
CPE 2.206
Professor: Inderjit Dhillon
(send email)
Office: Taylor Hall 5.148
Office Hours: Wed 10:00-11:00am
TA: Shailesh Kumar (send email)
Office: ENS 518
Office Hours: Thurs 10am-1pm
Paper Readings
Class Projects
Handouts
Relevant Books (on reserve in PCL)
Lectures
Material to be covered
Mathematical preliminaries - basics of linear algebra.
SVD (Singular Value Decomposition) and its use in indexing documents.
For example, Latent Semantic Indexing (LSI).
LSI page at Bellcore.
LSI page at Univ. of Tennessee, Knoxville.
Matrices, Vector Spaces and Information Retrieval by Michael W. Berry, Zlatko Drmac, Elizabeth R. Jessup.
Clustering algorithms (agglomerative clustering, graph-based algorithms, k-means).
Classification algorithms (linear discriminant analysis).
Focused Crawling of the WWW.
Focused Crawling: A New Approach to Topic-Specific Web Resource Discovery by Soumen Chakrabarti, Martin van den Berg and Byron Dom.
Data Visualization (Self-Organizing Maps (SOMs), Class-Preserving Projections).
Class Visualization of High-Dimensional Data with Applications. by Inderjit Dhillon, Dharmendra Modha, Scott Spangler, 1999. Free Software is available here.
XGobi is a system for multivariate data visualization by Deborah Swayne, Di Cook, Andreas Buja at Bellcore. The same page contains XGvis that can draw discrete graphs using MDS(Multidimensional Scaling) and was developed by Andreas Buja, Deborah F. Swayne, Michael L. Littman, Nathaniel Dean. Free Software is available from the provided link.
WEBSOM can plot 2-d maps of tect documents using Kohonen's Self-Organizing Maps for Internet Exploration. The above link has a demo for visually browsing newsgroup data.
Support Vector Machines (SVMs) and their application to document classification.
Graph Partitioning with applications to Image Segmentation.
Lecture notes 1
& 2
on graph partitioning by Jim Demmel
Normalized Cuts and Image Segmentation by Jianbo Shi and Jitendra Malik.
Motion Segmentation and Tracking Using Normalized Cuts by Jianbo Shi and Jitendra Malik.
The METIS Graph Partitioning Package.
SVD in face recognition.
Papers and Faces Database by Larry Sirovich.
Eigenfaces and Face Recognition at the MIT Media Lab.
Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection by Peter Belhumeur and Jo Hespanha and David Kriegman, July 1997.
Analyzing the graph of the WWW (hubs and authorities, the CLEVER project at IBM, PageRank at Google)
Authoritative sources in a hyperlinked environment by Jon Kleinberg.
The CLEVER project at IBM Almaden.
Hypersearching the Web by Members of the CLEVER project.
Related Courses
Stanford's
CS 349,
Data Mining, Search, and the World Wide Web, Fall 1998.
UC Berkeley's
CS 294-7, Large Datasets, Fall 1999.
UT Austin ECE course
EE 380L, A Practicum in Data Mining, Fall 1999.
Princeton's
CIS 700/702,
Information Retrieval, ?.