Data mining is the process of analyzing data sets for patterns or structure that once discovered can be used for various tasks, generally described as descriptive tasks and predictive tasks. Most businesses and organizations collect data about their operations, products, and customers and use data mining techniques on this data to better understand and improve these activities. Research projects in many areas of science employ data mining to develop descriptive models for understanding in their respective domains and to develop predictive models. |
  |
This course provides the student with an understanding of issues around collecting, understanding, and preparing data for use in a broad range of data mining areas (including classification, association analysis, clustering, anomoly detection, regression). For each of these tasks: classification, association analysis, clustering, anomoly detection, regression, dimensionality reduction; the student will become familiar with multiple techinques for generating descriptive or predictive models, and metrics and methods for evaluating the suitability of generated models for the associated task. To practice these techniques for generating and evaluating models, the student will apply data mining tools to provided data sets. |
  |
CS 429 (or 310) or 429H (or 310H). |
  |
|
  |
David Franke
|
  |
Xueyu Mao
|
  |
Assignment # | ||||
---|---|---|---|---|
Date | Given | Due | Points | Topics - Reading |
Jan. 18 | Introduction | |||
Jan. 23 | Data | |||
Jan. 25 | Classification | |||
Jan. 30 | ||||
Feb. 1 | Sections 4.3.7, 4.4 | |||
Feb. 6 | Sections 4.2, 4.5, 4.6 | |||
Feb. 8 | Sections 5.1, 5.2 | |||
Feb. 13 | Sections 2.4, 5.3 | |||
Feb. 15 | Sections 5.4, 5.5 | |||
Feb. 20 | Sections 5.6, 5.7 | |||
Feb. 22 | Association Analysis Sections 6.2, 6.3, 6.4 |
|||
Feb. 27 | ||||
Mar. 1 | Sections 6.5, 6.6 | |||
Mar. 6 | Sections 6.7, 6.8 | |||
Mar. 8 | Midterm Exam | |||
Mar. 13 | Spring Break | |||
Mar. 15 | ||||
Mar. 20 | Clustering Sections 8.1, 8.2 |
|||
Mar. 22 | Sections 8.3, 8.4 | |||
Mar. 27 | Sections 8.5, 9.1 | |||
Mar. 29 | Sections 9.2, 9.3, 9.4 | |||
Apr. 3 | Anonomly Detection Sections 10.1, 5.7 |
|||
Apr. 5 | Sections 10.2, 10.3 | |||
Apr. 10 | Sections 10.4, 10.5 | |||
Apr. 12 | Regression | |||
Apr. 17 | ||||
Apr. 19 | ||||
Apr. 24 | ||||
Apr. 26 | Dimensionality Reduction | |||
May 1 | ||||
May 3 | Review | |||
May 13, 9-12 | Final Exam: CLA 0.102 |
Assignments are due at the start of class (9:30 AM) on the due date, as we will be discussing the assignment solution during that class period.
Penalty for late submission is 40%, and late submissions will only be accepted up to the next assignment due date.
Final grade will be determined as follows:
As the practice of data mining requires interpretation of results and an understanding of the strengths and limitations of techniques and algorithms, is it expected that significant learning will be gained from discussion of problems and experiences. Students are expected to particiapte in class discussions, posing questions and proposing solutions.
You are free to discuss approaches to solving the assigned problems with your classmates, but each student is expected to write their own solutions. If duplicate work is detected, all parties involved will be penalized. All students should read and be familiar with the UTCS Rules to Live By. |
  |
Any student with a documented disability who requires academic accommodations should contact Services for Students with Disabilities (SSD) at (512) 471-6259 (voice) or 1-866-329-3986 (video phone). Faculty are not required to provide accommodations without an official accommodation letter from SSD.
|
  |
Important dates for the Fall 2016 semester can be found on the Academic Calendar. |
  |