Syllabus for CS 327E Elements of Databases - Spring 2022

Class Meetings: Friday 4:00pm - 7:00pm
Class Location: Zoom and GDC 1.304*

Instructor: Shirley Cohen
Email: scohen at cs dot utexas dot edu
Office Hours: Mondays from 7:00pm - 8:00pm on Zoom

TA: Karan Sadananda Karnad
Email: karan dot karnad at utexas dot edu
Office Hours: Tuesdays and Thursdays from 9:30am - 11:00am

TA: Sameer Haniyur
Email: sameerhaniyur2 at utexas dot edu
Office Hours: Wednesdays and Fridays from 12:00pm - 1.30pm

*The class will have meetings both on Zoom and in-person. Even when we are in-person, the class will be available via Zoom for students who choose to join virtually. For all Zoom links (class meetings and office hours), please see Canvas.

Course Description:
This course is designed to give students a practical introduction to databases and data systems. The goal is to learn modern data management and data processing techniques through a mix of best practices, experimentation, and problem solving.

The contents of the course are organized into three broad areas: 1) query languages with an emphasis on SQL; 2) data models from relational to document to graph; and 3) data engineering with a focus on data processing and scalability testing.

We will construct multiple databases for operational and analytical purposes throughout the term. The work will be done on Google Cloud Platform using a variety of database technologies and data science tools: MySQL, Postgres, BigQuery, Firestore, MongoDB, Neo4j, Jupyter Notebooks, and Data Studio.

Below are some of the topics we will cover:

SQL:
- select-from-where
- order-bys
- joins
- inserts, updates, deletes
- aggregates
- group-bys
- subqueries

Data Models:
- relational
- document
- graph
- nested

Data Engineering:
- ingestion
- data transformations
- data visualizations
- scalability testing

Prerequisites:
The course assumes a programming background and in particular, a solid working knowledge of Python scripting. As such, the prerequisites for this course are CS 303E, CS 307 or the equivalent. Familiarity with SQL is also helpful, but not required.

Textbooks:
There are two required texts for this course:
- Alan Beaulieu, Learning SQL, Third Edition, 2020.
- Dan Sullivan, NoSQL for Mere Mortals, First Edition, 2015.

Supplemental Readings:
In addition to the required readings, the assignments will involve consulting the product documentation on Cloud SQL, Cloud Spanner, BigQuery, Firestore, MongoDB, Neo4j, and Data Studio. All documentation will be available online.

Projects:
The most important component of this course are the projects. The projects are intended to give you hands-on experience with the database systems and tools. They will start with the basic CRUD operations and move on to more advanced capabilities.

There are two types of projects, weekly projects and a Final Project. The weekly projects are aimed at giving you some practice with the series of database systems. They will be assigned as homework and will require outside class time to complete. The Final Project will be a scalability study of a chosen database system. You will design experiments to evaluate the scalability of the system and document your findings in a written report.

All projects will be carried out in groups of two students. You will form groups at the start of the term and work with the same partner throughout the term. More details on the projects will be provided in the week-by-week section below.

Exams:
There will be 2 midterms and no final exam. The tests are comprehensive and will cover all the material to-date, including readings, projects, and lectures. They will be open-book and taken during class time via Canvas. Unfortunately, no make-up tests will be offered due to our limited resources.

Participation:
We will be holding synchronous class meetings so that you have the opportunity to discuss questions and work together with other students. My goal is to spend the majority of class time actively working through problems and clarifying difficult concepts. You will need to have a stable internet connection and a laptop or desktop computer so that you can fully present for each class.

There are two types of participation questions. The first will be multiple choice and answered through UT Instapoll. These questions will be based on the assigned setup guide for that day as well as the practice problems we work on in class. The second type will be a short-presentation on your solution to a practice problem, shared with the whole class.

Absences:
Excused absences may be given only for verifiable medical or family emergencies. Written documentation must be provided to qualify for an excused absence. The medical documentation must specifically state that you could not attend class due to your illness and must be signed by a physician. A job or internship interview or any other appointment does not constitute an excused absence.

Grading Rubric:
The basic grading rubric is comprised of the four components listed below:

Note: The final grade will use the plus/minus grading system.

Late Submission Policy:
There is a 10% reduction in the grade per day. This applies to all project submissions throughout the term.

Tools:
- Zoom for online instruction.
- Google Cloud Platform for practice problems and project work.
- GitHub for code repository, version control, and how-to guides.
- Lucidchart for diagramming.
- Piazza for asynchronous communication (announcements, questions, discussions).
- Canvas for grade reporting.

Academic Integrity:
This course will abide by UTCS' code of academic integrity.

Students with Disabilities:
Students with disabilities may request appropriate academic accommodations.

Week-by-Week Schedule:
Below is a week-by-week schedule that includes the important milestones and assigned readings:

Acknowledgments:
This course is generously supported by Google by giving us access to their Cloud Platform.