CS378: Natural Language Processing (Spring 2019)
NOTE: This page is for an old semester of this class
Instructor: Greg Durrett, gdurrett@cs.utexas.edu
Lecture: Tuesday and Thursday 9:30am - 11:00am, GDC 5.302
Instructor Office Hours: Tuesday 1pm-2pm / Wednesday 11am-12pm GDC 3.420
TAs: Jiacheng Xu (jcxu@cs.utexas) Shivangi Mahto (shivangi@cs.utexas)
TA Office Hours: Monday/Wednesday 1pm-2pm (Jiacheng) GDC 1.302 Desk 1/2, Thursday 2pm-3pm (Shivangi) GDC 1.302 Desk 3
Piazza
Note that this is an old version of this course (Spring 2019 edition)
Description
This course provides an introduction to modern natural language processing
using machine learning and deep learning approaches. Content includes
linguistics fundamentals (syntax, semantics, distributional properties of
language), machine learning models (classifiers, sequence taggers, deep
learning models), key algorithms for inference, and applications to a range of
problems. Students will get hands-on experience building systems to do tasks
including text classification, syntactic analysis, language modeling, and language generation.
Requirements
- CS 429
- Recommended: CS 331, familiarity with probability and linear algebra, programming experience in Python
- Helpful: Exposure to AI and machine learning (e.g., CS 342/343/363)
Syllabus
Detailed syllabus with course policies
Assignments:
Assignment 0: Warmup [nyt dataset]
Assignment 1: Sentiment Classification [code and dataset download]
Assignment 2: Feedforward Neural Networks [code and dataset download]
Assignment 3: Sequence Modeling and Parsing [code and dataset on Canvas]
Midterm: topics and practice questions
Assignment 4: Character Language Modeling with RNNs [code and dataset download]
Final Project
Readings: Textbook readings are assigned to complement the material discussed in lecture. You may find it useful
to do these readings before lecture as preparation or after lecture to review.
Paper readings are intended to supplement the course material if you are interested in diving deeper on particular topics.
The chief text in this course is Eisenstein: Natural Language Processing,
available as a free PDF online. For deep learning techniques, this text will be supplemented with selections from Goldberg: A Primer on Neural Network Models for Natural Language Processing.
(Another generally useful NLP book is Jurafsky and Martin: Speech and Language Processing (3rd ed. draft), with many draft chapters available for free online; however,
we will not be using it for this course.)
Readings for future lectures are tentative and subject to change.
Date |
Topics |
Readings |
Assignments |
Jan 22 |
Introduction [4pp] |
|
A0 out |
Jan 24 |
Classification I: Features, Naive Bayes |
Eisenstein 2.0 (= intro to ch 2), 2.1, 4.1, 4.3 |
A0 due Friday / A1 out |
Jan 29 |
Classification II: Perceptron, Logistic Regression |
Eisenstein 2.2, 2.4, Pang+ Thumbs Up, Wang+Manning |
|
Jan 31 |
Classification III: Multiclass |
Eisenstein 2.4.1, 2.5, 4.2, Schwartz+ Authorship |
|
Feb 5 |
Neural I: Feedforward [4pp] |
Eisenstein 3.0-3.3, Goldberg 3-4, ffnn_example.py |
|
Feb 7 |
Neural II: Implementation, Word embeddings [4pp] |
Eisenstein 3.3, Goldberg 6, ffnn_example.py |
A1 due / A2 out |
Feb 12 |
Neural III: Word embeddings, NNs for NLP [4pp] |
Eisenstein 14.5-14.6, Goldberg 5, word2vec, GloVe, NLP with FFNNs, DANs |
|
Feb 14 |
Sequence I: Tagging, POS, HMMs |
Eisenstein 7.1-7.4, 8.1, Manning POS |
|
Feb 19 |
Sequence II: HMMs, Viterbi, Beam Search |
Eisenstein 7.3-7.4, Viterbi lecture note |
|
Feb 21 |
Sequence III: CRFs, NER / Trees I: Grammar [4pp] |
Eisenstein 7.5-7.6, 10.1-10.2 |
A2 due / A3 out |
Feb 26 |
Trees II: PCFGs, CKY |
Eisenstein 10.3, 10.4.1 |
|
Feb 28 |
Trees III: Better grammars, Dependency I [4pp] |
Eisenstein 10.5, 11.1, Unlexicalized parsing |
|
Mar 5 |
Trees IV: Dependency II |
Eisenstein 11.3-4 |
|
Mar 7 |
Information Extraction [4pp] |
|
|
Mar 12 |
LM I: Ngrams / Midterm review |
Eisenstein 6.1-6.2 |
A3 due Monday 3/11 |
Mar 14 |
MIDTERM (in-class) |
|
|
Mar 19 |
NO CLASS (SPRING BREAK) |
| |
|
Mar 21 |
NO CLASS (SPRING BREAK) |
| |
|
Mar 26 |
LM II: LSTMs |
Eisenstein 6.3-6.5, Olah Understanding LSTMs |
A4 out |
Mar 28 |
LM III: Impl / MT I: Intro [4pp] |
Eisenstein 18.1-18.2, Karpathy Visualizing RNNs |
|
April 2 |
MT II: Phrase-based |
Eisenstein 18.2, 18.4, Pharaoh |
|
April 4 |
MT III: Decoding, Seq2seq [4pp] |
Eisenstein 18.3 |
|
April 9 |
MT IV: Seq2seq (cont'd), attention [4pp] |
Eisenstein 18.3, Attention |
A4 due / FP out |
April 11 |
DIAL I: Chatbots [4pp] |
Eisenstein 19.3.3, Diversity, PersonaChat, Alexa Team Gunrock |
FP proposal due Friday 4/12 |
April 16 |
DIAL II: Task-oriented [4pp] |
Eisenstein 19.3.1-2 |
|
April 18 |
Neural IV: Transfer Learning [4pp] |
ELMo |
|
April 23 |
Neural V: Transformers [4pp] |
BERT, GPT, Transformers, Illustrated Transformer |
|
April 25 |
QA I: Semantic representations |
Eisenstein 12, Freebase QA |
|
April 30 |
QA II: Semantic parsing [4pp] |
Zettlemoyer, Jia |
|
May 2 |
QA III: Reading comprehension |
Eisenstein 17.5, Stanford Attentive Reader, SQuAD, BiDAF |
|
May 7 |
Multilingual Methods [4pp] |
|
May 9 |
Wrapup + Ethics [4pp] |
|
|