Instructor: Greg Durrett, gdurrett@cs.utexas.edu
Lecture: Tuesday and Thursday 9:30am - 11:00am, GDC 5.302
Instructor Office Hours: Tuesday 1pm-2pm / Wednesday 11am-12pm GDC 3.420
TAs: Jiacheng Xu (jcxu@cs.utexas) Shivangi Mahto (shivangi@cs.utexas)
TA Office Hours: Monday/Wednesday 1pm-2pm (Jiacheng) GDC 1.302 Desk 1/2, Thursday 2pm-3pm (Shivangi) GDC 1.302 Desk 3
Natural language processing (NLP) is a subfield of AI focused on solving problems that involve dealing with human language in a sophisticated way: these include information extraction, machine translation, automatic summarization, conversational dialogue, syntactic analysis, and many others. Much of the progress on these problems over the last 25 years has been driven by statistical machine learning and, more recently, deep learning. One distinctive feature of language compared to other types of data is its structured nature: modeling language involves understanding the linguistic phenomena it exhibits and grappling with it as a sequentially-structured, tree-structured, or graph-structured entity.
This class is intended to be a survey of modern NLP in two respects. First, it covers the main applications of NLP techniques today, both in academia and in industry, as well as enough linguistics to put these problems in context and understand their challenges. Second, it covers a range of models in structured prediction and deep learning including classifiers, sequence models, statistical parsers, neural network encoders, and encoder-decoder models. We study the models themselves, examples of problems they are applied to, inference methods, parameter estimation, and optimization. Programming assignments involve building scalable machine learning systems for various NLP tasks and seeing how these models can be put into practice.
Prerequisites
Lectures are 9:30-11:00am Tuesday and Thursday in GDC 5.302. A complete schedule of lectures and assignments, complete with readings, is on the main website page.
There are five assignments in the course: two "mini" assignments, two projects, and a final project. The timeline of these assignments is on the course calendar. Assignment specifications, code, and data will be made available on the course website and Canvas.
Religious Holy Days: A student who is absent from an examination or cannot meet an assignment deadline due to the observance of a religious holy day may take the exam on an alternate day or submit the assignment up to 24 hours late without penalty, if proper notice of the planned absence has been given. Notice must be given at least 14 days prior to the classes which will be missed. For religious holy days that fall within the first 2 weeks of the semester, notice should be given on the first day of the semester. Notice should be personally delivered to the instructor and signed and dated by the instructor, or emailed, in which case a student submitting email notification must receive email confirmation from the instructor.
The assignments will feature a combination of written question and coding assignments with various scope. Detailed instructions for assignment completion and submission are given with each assignment.
Late Assignments: For each day late an assignment is turned in, 15% of the credit for that assignment will be deducted. So, an assignment turned in two days late will automatically lose 30%, before any points are taken off for mistakes. Additional extensions may be granted in cases of medical or other emergencies, but must be agreed on with the course staff before the project's original due date. Religious holy days are covered by the policy listed above.
There will be one in-class midterm as described on the course calendar. Students will be allowed one standard letter (8.5" x 11") page of notes during exams. Use of electronic communication devices (phones, laptops, e-readers, etc.) is banned during the exam.
The final project is an opportunity for more open-ended exploration of concepts in the course. This can be completed individually or in groups of 2; working in groups is encouraged! Several options will be provided. Students will write a brief 1-page proposal describing what they plan to do and how they plan to do it, which the course staff will provide feedback on.
Your final grade is computed based on the total points earned across all assignments. The final grade is mapped to a letter as follows, with grades on the boundary receiving the higher grade:
A | 100 - 93.3 |
A- | 93.3 - 90.0 |
B+ | 90.0 - 86.6 |
B | 86.6 - 83.3 |
B- | 83.3 - 80.0 |
C+ | 80.0 - 76.6 |
C | 76.6 - 73.3 |
C- | 73.3 - 70.0 |
D | 70 - 65 |
F | below 65 |
Depending on class performance, the instructors may shift these boundaries down to raise students' grades.
Please read the department's academic honesty policies. For this course, students should complete all assignments independently, excluding the final project, which may be completed in groups. Limit any discussion of assignments with other students to clarification of the requirements or definitions of the problems, or to understanding the existing code or general course material. Never discuss issues directly relevant to problem solutions with other students. Finally, note that you may not use external resources (e.g., code on Github that does the assigned task) except where explicitly authorized by the course staff.
Be sure you respect these policies when posting on Piazza. Asking clarifying questions, addressing possible bugs in the provided code, etc. are fair game, but you should not discuss solutions in a substantive way. When in doubt, post privately to the instructors.
Students who violate these policies may receive a failing grade on the assignment in question or for the course overall, depending on the instructors' judgment and the severity of the infraction.
Disabilities: Students with disabilities may request appropriate academic accommodations from the Division of Diversity and Community Engagement, Services for Students with Disabilities at 512-471-6259.