Syllabus for CS378: Natural Language Processing

Instructor: Greg Durrett, gdurrett@cs.utexas.edu
Lecture: Tuesday and Thursday 9:30am - 11:00am held on Zoom
Instructor Office Hours: Tuesday 1pm-2pm / Wednesday 10am-11am TA Office Hours: held on Zoom
TAs: Yasumasa Onoe (yasumasa@utexas.edu), Shrey Desay (shreydesai@utexas.edu)
TA Office Hours: held on Zoom

Changes made to accommodate the COVID-19 situation have been highlighted in red

Description

Natural language processing (NLP) is a subfield of AI focused on solving problems that involve dealing with human language in a sophisticated way: these include information extraction, machine translation, automatic summarization, conversational dialogue, syntactic analysis, and many others. Much of the progress on these problems over the last 25 years has been driven by statistical machine learning and, more recently, deep learning. One distinctive feature of language compared to other types of data is its structured nature: modeling language involves understanding the linguistic phenomena it exhibits and grappling with it as a sequentially-structured, tree-structured, or graph-structured entity.

This class is intended to be a survey of modern NLP in two respects. First, it covers the main applications of NLP techniques today, both in academia and in industry, as well as enough linguistics to put these problems in context and understand their challenges. Second, it covers a range of models in structured prediction and deep learning including classifiers, sequence models, statistical parsers, neural network encoders, and encoder-decoder models. We study the models themselves, examples of problems they are applied to, inference methods, parameter estimation, and optimization. Programming assignments involve building scalable machine learning systems for various NLP tasks and seeing how these models can be put into practice.

Prerequisites

Lectures

Lectures are 9:30-11:00am Tuesday and Thursday held remotely on Zoom. A complete schedule of lectures and assignments, complete with readings, is on the main website page.

The Zoom lectures will be recorded and made available later for students in the class to watch. Any student who is uncomfortable with being recorded should keep their microphone muted and video off. We will not distribute these recorded lectures to anyone outside of the class, and you should not either, for privacy and copyright reasons.

Coursework

The timeline of assignments is on the course calendar. All assignment deadlines after spring break have been pushed back by one week. Assignment specifications, code, and data will be made available on the course website and Canvas.

The assignments and their weights in the final grade are unchanged since the start of the semester. The primary difference is that the final project timeline is accelerated and, correspondingly, expectations for the final project have been scaled back. Concrete expectations will be conveyed when the final project spec is released.

Religious Holy Days: A student who is absent from an examination or cannot meet an assignment deadline due to the observance of a religious holy day may take the exam on an alternate day or submit the assignment up to 24 hours late without penalty, if proper notice of the planned absence has been given. Notice must be given at least 14 days prior to the classes which will be missed. For religious holy days that fall within the first 2 weeks of the semester, notice should be given on the first day of the semester. Notice should be personally delivered to the instructor and signed and dated by the instructor, or emailed, in which case a student submitting email notification must receive email confirmation from the instructor.

Other Extensions: Extensions may be granted in cases of medical emergency or other circumstances. In all cases, the student should inform the course staff as soon as is practical, and the extension must be negotiated before the assignment's original due date.

Assignments

The assignments will feature a combination of written question and coding assignments with various scope. Detailed instructions for assignment completion and submission are given with each assignment.

Slip Days: Each student is given 2 slip days to use throughout the term. Any number of these days can be applied to any assignment to extend the deadline for that assignment by that many days. E.g., you can turn in Assignment 1 one day late and Assignment 4 one day late, or you could turn in a single assignment two days late. Slip days can only be used for assignments and not the midterm or final project. Slip days cannot be used fractionally: you must choose to use 0, 1, or 2 slip days for an assignment.

Starting with Assignment 4, each student is now given two additional slip days. These can be applied to Assignments 4 or 5 in addition to any other slip days you still have. They cannot be applied retroactively to earlier assignments.

Late Assignments: For each day late an assignment is turned in not covered by a slip day or negotiated extension (listed above), 15% of the credit for that assignment will be deducted. So, an assignment turned in two days late will automatically lose 30%.

Midterm

There will be one in-class midterm as described on the course calendar. Students will be allowed one standard letter (8.5" x 11") page of notes during exams. Use of electronic communication devices (phones, laptops, calculators, etc.) is banned during the exam.

Final Project

The final project is either an in-depth exploration of question answering or an opportunity for more open-ended exploration of concepts in the course. Both options can be completed individually or in groups of 2; working in groups is encouraged! If you wish to pursue your own independent project, your group must write a brief 1-page proposal by March 31 describing what you plan to do and how you plan to do it, which the course staff will provide feedback on. Independent projects do not necessarily have to "work," but will be held to a high standard in terms of expected effort, insight, and technical sophistication.

Final Grades

Your final grade is computed based on the total points earned across all assignments. The final grade is mapped to a letter as follows, with grades on the boundary receiving the higher grade:

A 100 - 93.3
A- 93.3 - 90.0
B+ 90.0 - 86.6
B 86.6 - 83.3
B- 83.3 - 80.0
C+ 80.0 - 76.6
C 76.6 - 73.3
C- 73.3 - 70.0
D 70 - 65
F below 65

Depending on class performance, the instructors may shift these boundaries down to raise students' grades.

Academic Honesty

Please read the department's academic honesty policies. For this course, students should complete all assignments independently, excluding the final project, which may be completed in groups. Limit any discussion of assignments with other students to clarification of the requirements or definitions of the problems, or to understanding the existing code or general course material. Never discuss issues directly relevant to problem solutions with other students. Finally, note that you may not use external resources (e.g., code on Github that does the assigned task) except where explicitly authorized by the course staff.

Be sure you respect these policies when posting on Piazza. Asking clarifying questions, addressing possible bugs in the provided code, etc. are fair game, but you should not discuss solutions in a substantive way. When in doubt, post privately to the instructors.

Students who violate these policies may receive a failing grade on the assignment in question or for the course overall, depending on the instructors' judgment and the severity of the infraction.

Miscellaneous

Disabilities: Students with disabilities may request appropriate academic accommodations from the Division of Diversity and Community Engagement, Services for Students with Disabilities at 512-471-6259.