Syllabus for CS 327E Elements of Databases - Fall 2016

Instructor: Shirley Cohen
Email: scohen at cs dot utexas dot edu
Office hours: After class or by appointment in GDC 4.314

TA: Hao Chen
Email: chenhao at utexas dot edu
Office hours: Tuesdays and Thursdays 3:30pm - 4:30pm in GDC 1.302 at the TA Station Desk 1

Proctor: Adam Roach
Email: adamroach94 at gmail dot com
Office hours: Thursdays 6:30pm - 8pm in CBA 3.300 (Atrium in business building)

Class Times: Mondays and Wednesdays 6:30pm - 8pm
Class Location: CLA 1.104

Course Description:
This course is a practical introduction to relational databases and SQL. It is a course about how to use a database and develop a database application rather than a course about database administration.

The contents of the course are organized into three main topics: 1) database schema design with a focus on the relational model; 2) SQL with an emphasis on the standard features of the query language; and 3) database application development, namely loading structured datasets into a relational database and querying the database from an application. The course will also touch upon a few special topics such as SQL injections, data warehousing, and big data.

Throughout the term we will use MySQL server to gain hands-on experience with SQL programming. I have chosen MySQL because it is simple yet expressive and it is also the database engine that is used by our SQL textbook.

There will be three lab projects over the course of the semester, each worth 10%. The labs will build on each other and build up to the final project, worth 20%. The labs and final project will consist of an end-to-end data pipeline written in Python with a MySQL database back-end. The data pipeline will have the following capabilities: a data loader that injests one or more related datasets from CSV files and a second loader that pulls a related JSON dataset from a RESTful API. The data pipeline will cleanse and transform the injested CSV datasets before loading them into a fully normalized schema in a MySQL database. The data pipeline will store the JSON documents retrieved from the API in their native form inside MySQL using the new JSON type in MySQL 5.7. The schema will use views to join the related datasets. The data pipeline will have a read-only query interface that supports retrieving the data through the use of filters, aggregates, group-bys, and order-bys. The query interface will also support querying the JSON documents through the use of MySQL's JSON extensions.

Students will work in pairs on the labs and final project. The code for all deliverables will be kept under version control on Github. At the end of the semester, each group will submit a technical report that describes their project and lessons learned throughout the semester. Students will choose their lab partner at the start of the term and work with the same partner throughout the semester.

The course will be taught using active learning techniques as much as possible. Therefore, students will spend the majority of class time actively working through problems rather than listening to a traditional lecture. It is therefore crucial that you prepare for each class by doing the assigned readings prior to class time. There will be a short quiz at the start of class based on the assigned readings. You will earn points for participating in class activities. The goal is to use class time to review difficult concepts, work through problems, make progress on the project, and learn from your peers.

Below is a list of some of the topics we will be covering throughout the term.

Database Design:
-logical database design
-inheritance
-normalization forms (with focus on 3NF)
-physical database design
-optimization techniques

SQL:
-select-from-where
-order-bys
-joins
-inserts, updates, deletes
-aggregates and group-bys
-views
-path expressions (for querying JSON)

Application Development:
-data injestion
-data transformation
-data exchange
-data integration
-basic security (e.g. protections against SQL injections)

Textbooks:
There are three required texts for the course and one optional text. If you are new to Python, I highly recommend that you obtain all four texts.

Required texts:
Graeme Simsion and Graham Witt's Data Modeling Essentials, 3rd edition, 2005.
Alan Beaulieu's Learning SQL, 2nd edition, 2009.
Jacqueline Kazil and Katharine Jarmul's Data Wrangling with Python, 1st edition, 2016.

Optional text:
Mark Lutz's Learning Python, 5th edition, 2013.

Note that the Learning SQL textbook is available as an ebook through the UT Library.

Clickers:
We will be using clickers for reading quizzes and class participation. You will need to have your own clicker and bring it to class each time. You can use either the newer i>clicker 2 or the older i>clicker. You can buy an i>clicker 2 from the University Co-op if you don't already have one. Once you have your clicker, please remember to register it with this class. The easiest way to do that is through Canvas (click on the i>Clicker link from the left navigation menu and follow the instructions).

Grading Rubric:
The basic grading rubric is comprised of the five components listed below.

Note that there is no midterm for this class and the final will be a closed book exam.

Late Submission Policy:
10% reduction in the grade per day.

Communication and Collaboration Tools:
Piazza for class discussions.
Canvas for grade reporting.
Github for labs and final project.

Academic Integrity:
This course will abide by UTCS' code of academic integrity.

Students with Disabilities:
Students with disabilities may request appropriate academic accomodations.

Week-by-Week Schedule:
Below is a week-by-week schedule that includes the important milestones and assigned readings.