CS388: Natural Language Processing (Spring 2024)

Instructor: Greg Durrett, gdurrett@cs.utexas.edu
Lecture: Tuesday and Thursday 12:30pm - 1:45pm, GDC 4.302
Instructor Office Hours: Wednesdays 1:15pm-2:15pm (starting 1/24), Thursdays at 10am (starting 1/18), on Zoom and in GDC 3.812 (hybrid)
TA: Anisha Gunjal
TA Office Hours:

Tuesday 3pm, GDC 1st floor TA Desk 4 + on Zoom
Wednesday 3pm, on Zoom only

Description

This class is a graduate-level introduction to Natural Language Processing (NLP), the study of computing systems that can process, understand, or communicate in human language. The course covers fundamental approaches, particularly deep learning and language model pre-training, used across the field of NLP, as well as a comprehensive set of NLP tasks both historical and contemporary. Techniques studied include basic classification techniques, feedforward neural networks, attention mechanisms, pre-trained large language models (BERT-style encoders and GPT-style LLMs), and structured models (sequences, trees, etc.). Problems range from syntax (part-of-speech tagging, parsing) to semantics (lexical semantics, question answering, grounding) and include various applications such as summarization, machine translation, information extraction, and dialogue systems. Programming assignments throughout the semester involve building scalable machine learning systems for various of these NLP tasks.

Requirements

391L - Machine Learning, 343 - Artificial Intelligence, or equivalent AI/ML course experience
Familiarity with Python for programming assignments
Additional prior exposure to discrete math, probability, linear algebra, optimization, linguistics, and NLP useful but not required

Syllabus [Clickable link with important information about the course policies. The current page you are on is NOT the complete syllabus]

Assignments: See syllabus for more details about these.

Project 1: Linear and Neural Sentiment Classification [code and dataset download]

Project 2: Transformer Language Modeling [code and dataset download]

Project 3: Dataset Biases

Final Project [sample 1 (reproduction study by 2 students)]; [sample 2 (original research by 1 student)]

Readings: Textbook readings are assigned to complement the material discussed in lecture. You may find it useful to do these readings before lecture as preparation or after lecture to review, but you are not expected to know everything discussed in the textbook if it isn't covered in lecture. Paper readings are intended to supplement the course material if you are interested in diving deeper on particular topics. Bold readings and videos are most central to the course content; it's recommended that you look at these.

The chief text in this course is Eisenstein: Natural Language Processing, available as a free PDF online. For deep learning techniques, this text will be supplemented with selections from Goldberg: A Primer on Neural Network Models for Natural Language Processing. (Another generally useful NLP book is Jurafsky and Martin: Speech and Language Processing (3rd ed. draft), with many draft chapters available for free online; however, we will not be using it much for this course.)

Date Topics Readings Assignments

Jan 16 Introduction [4pp] P1 out

Jan 18 Binary Classification [4pp] Eisenstein 2.0-2.5, 4.2-4.4.1
Perceptron and logistic regression

Jan 23 Multiclass Classification [4pp] Eisenstein 4.2
Multiclass lecture note

Jan 25 Neural 1: Feedforward [4pp] Eisenstein 3.0-3.3
Botha+17 FFNNs
Iyyer+15 DANs
ffnn_example.py

Jan 30 Neural 2: Word Embeddings, Bias in Embeddings [4pp] Eisenstein 3.3.4, 14.5-14.6
Goldberg 5
Mikolov+13 word2vec
Pennington+14 GloVe
Levy+14 Matrix Factorization
Grave+17 fastText
Burdick+18 Instability
Bolukbasi+16 Gender
Gonen+19 Debiasing P1 due / P2 out

Feb 1 Neural 3: Language Modeling, Attention [4pp] Bengio+03 NPLM
Luong+15 Attention
Vaswani+17 Transformers
Alammar Illustrated Transformer
PhuongHutter Transformers

Feb 6 Neural 4: Transformers [4pp] Vaswani+17 Transformers
Alammar Illustrated Transformer
Kaplan+20 Scaling Laws
Beltagy+20 Longformer
Choromanski+21 Performer
Tay+20 Efficient Transformers

Feb 8 Pre-training 1: Encoders (BERT), Tokenization [4pp] Peters+18 ELMo
Devlin+19 BERT
Alammar Illustrated BERT
Liu+19 RoBERTa
Clark+20 ELECTRA
He+21 DeBERTa
BostromDurrett20 Tokenizers FP proposals out

Feb 13 Pre-training 2: Decoders (GPT/T5), Decoding Methods [4pp] Raffel+19 T5
Lewis+19 BART
Radford+19 GPT2
Brown+20 GPT3
Chowdhery+21 PaLM
Holtzman+19 Nucleus Sampling P2 due

Feb 15 Evaluation in NLP, Datasets, Dataset Bias, Statistical Significance [4pp] Wang+19 SuperGLUE
BIGBench
Gururangan+18 Artifacts
McCoy+19 Right
Gardner+20 Contrast
Swayamdipta+20 Cartography
Utama+20 Debiasing

Feb 20 Understanding GPT3 1: Prompting, Interpreting GPT-3 [4pp] Zhao+21 Calibrate Before Use
Min+22 Rethinking Demonstrations
Gonen+22 Demystifying Prompts
Xie+21 ICL as Implicit Bayesian Inference
Akyurek+22 ICL regression
Olson+22 Induction Heads FP proposals due / P3 out

Feb 22 Understanding GPT3 2: Rationales, Chain-of-thought [4pp] Camburu+18 e-SNLI
Wei+22 CoT
YeDurrett22 Unreliability
Kojima+22 Step-by-step
Gao+22 Program-aided
Ye+22 Complementary
Ye+23 SatLM

Feb 27 Understanding GPT3 3: Instruction tuning, RL in NLP [4pp] Sanh+21 T0
Liu+21 Prompting
Chung+22 Flan-PaLM
Ouyang+22 Human Feedback
Ramamurthy+22 RL for NLP
Rafailov+23 DPO
Singhal+23 Length

Feb 29 Interpretability [4pp] Lipton+16 Mythos
Ribeiro+16 LIME
Simonyan+13 Visualizing
Sundararajan+17 Int Grad
Interpretation Tutorial

Mar 5 Sequence Tagging [4pp] Eisenstein 7, 8
Manning+11 POS
Sutton CRFs 2.3, 2.6.1
Wallach CRFs

Mar 7 No class (Greg traveling) P3 due

Mar 12 NO CLASS

Mar 14 NO CLASS

Mar 19 Trees 1: Constituency, PCFGs [4pp] Eisenstein 10.0-10.5
JM 12.1-12.6, 12.8
KleinManning13 Structural
Collins97 Lexicalized

Mar 21 Trees 2: Dependency, Shift-reduce, State-of-the-art Parsers [4pp] Eisenstein 11.1-11.3
JM 13.1-13.3, 13.5
Dozat+17 Dependency
JM 13.4
Andor+16 Parsey
KitaevKlein18
KitaevKlein20 Linear-time

Mar 26 Apps 1: Question Answering [4pp] Eisenstein 12
Chen+17 DrQA
Lee+19 Latent Retrieval
Guu+20 REALM
Kwiatkowski+19 NQ
Dua+19 DROP
Nakano+21 WebGPT
Choi+18 QuAC

Mar 28 Apps 2: Machine Translation [4pp] Eisenstein 18.1-18.2, 18.4
Michael Collins IBM Models 1+2
JHU slides
History of MT
SennrichZhang19 Low-resource
Aji+20 Transfer
Liu+20 mBART
Kocmi+23 LLMs for Eval

April 2 Apps 3: Language and Code [4pp] ZettlemoyerCollins05
Berant+13
JiaLiang16 Recomb
Wei+20 Type Inference
Wang+21 CodeT5
Chen+21 Codex

April 4 Efficiency in LLMs: Decoding, Pruning, Training [4pp] Levaiathan+23 Speculative
Medusa Heads (blog)
Dao+23 Flash Attention
Xia+23 Sheared Llama
Sanh+19 DistilBERT
Hsieh+23 Distill Step-by-Step
Zakan+22 BitFit
Hu+21 LoRA
Dettmers+22 LLM.int8()
Dettmers+23 QLoRA Check-ins due

April 9 Language grounding [4pp] Radford+21 CLIP
Ahn+22 SayCan
Driess+23 PaLM-E

April 11 Morphology / LLM Safety [4pp] Shen+23 Jailbreaking
Zou+23 Attacks on LLMs
EldanRussinovich23 Unlearning
Mitchell+22 Model Editing
Onoe+23 Challenges in Propagating

April 16 LLMs, Society, and Ethics of NLP [4pp] HovySpruit16 Social Impact of NLP
Zhao+17 Bias Amplification
Rudinger+18 Gender Bias in Coref
BenderGebru+21 Stochastic Parrots
Gebru+18 Datasheets for Datasets
Raji+20 Auditing

April 18 No class (MLL Symposium)

April 23 FP presentations 1 [4pp]

April 25 FP presentations 2 [4pp] FP due May 3

Date	Topics	Readings	Assignments
Jan 16	Introduction [4pp]		P1 out
Jan 18	Binary Classification [4pp]	Eisenstein 2.0-2.5, 4.2-4.4.1 Perceptron and logistic regression
Jan 23	Multiclass Classification [4pp]	Eisenstein 4.2 Multiclass lecture note
Jan 25	Neural 1: Feedforward [4pp]	Eisenstein 3.0-3.3 Botha+17 FFNNs Iyyer+15 DANs ffnn_example.py
Jan 30	Neural 2: Word Embeddings, Bias in Embeddings [4pp]	Eisenstein 3.3.4, 14.5-14.6 Goldberg 5 Mikolov+13 word2vec Pennington+14 GloVe Levy+14 Matrix Factorization Grave+17 fastText Burdick+18 Instability Bolukbasi+16 Gender Gonen+19 Debiasing	P1 due / P2 out
Feb 1	Neural 3: Language Modeling, Attention [4pp]	Bengio+03 NPLM Luong+15 Attention Vaswani+17 Transformers Alammar Illustrated Transformer PhuongHutter Transformers
Feb 6	Neural 4: Transformers [4pp]	Vaswani+17 Transformers Alammar Illustrated Transformer Kaplan+20 Scaling Laws Beltagy+20 Longformer Choromanski+21 Performer Tay+20 Efficient Transformers
Feb 8	Pre-training 1: Encoders (BERT), Tokenization [4pp]	Peters+18 ELMo Devlin+19 BERT Alammar Illustrated BERT Liu+19 RoBERTa Clark+20 ELECTRA He+21 DeBERTa BostromDurrett20 Tokenizers	FP proposals out
Feb 13	Pre-training 2: Decoders (GPT/T5), Decoding Methods [4pp]	Raffel+19 T5 Lewis+19 BART Radford+19 GPT2 Brown+20 GPT3 Chowdhery+21 PaLM Holtzman+19 Nucleus Sampling	P2 due
Feb 15	Evaluation in NLP, Datasets, Dataset Bias, Statistical Significance [4pp]	Wang+19 SuperGLUE BIGBench Gururangan+18 Artifacts McCoy+19 Right Gardner+20 Contrast Swayamdipta+20 Cartography Utama+20 Debiasing
Feb 20	Understanding GPT3 1: Prompting, Interpreting GPT-3 [4pp]	Zhao+21 Calibrate Before Use Min+22 Rethinking Demonstrations Gonen+22 Demystifying Prompts Xie+21 ICL as Implicit Bayesian Inference Akyurek+22 ICL regression Olson+22 Induction Heads	FP proposals due / P3 out
Feb 22	Understanding GPT3 2: Rationales, Chain-of-thought [4pp]	Camburu+18 e-SNLI Wei+22 CoT YeDurrett22 Unreliability Kojima+22 Step-by-step Gao+22 Program-aided Ye+22 Complementary Ye+23 SatLM
Feb 27	Understanding GPT3 3: Instruction tuning, RL in NLP [4pp]	Sanh+21 T0 Liu+21 Prompting Chung+22 Flan-PaLM Ouyang+22 Human Feedback Ramamurthy+22 RL for NLP Rafailov+23 DPO Singhal+23 Length
Feb 29	Interpretability [4pp]	Lipton+16 Mythos Ribeiro+16 LIME Simonyan+13 Visualizing Sundararajan+17 Int Grad Interpretation Tutorial
Mar 5	Sequence Tagging [4pp]	Eisenstein 7, 8 Manning+11 POS Sutton CRFs 2.3, 2.6.1 Wallach CRFs
Mar 7	No class (Greg traveling)		P3 due
Mar 12	NO CLASS
Mar 14	NO CLASS
Mar 19	Trees 1: Constituency, PCFGs [4pp]	Eisenstein 10.0-10.5 JM 12.1-12.6, 12.8 KleinManning13 Structural Collins97 Lexicalized
Mar 21	Trees 2: Dependency, Shift-reduce, State-of-the-art Parsers [4pp]	Eisenstein 11.1-11.3 JM 13.1-13.3, 13.5 Dozat+17 Dependency JM 13.4 Andor+16 Parsey KitaevKlein18 KitaevKlein20 Linear-time
Mar 26	Apps 1: Question Answering [4pp]	Eisenstein 12 Chen+17 DrQA Lee+19 Latent Retrieval Guu+20 REALM Kwiatkowski+19 NQ Dua+19 DROP Nakano+21 WebGPT Choi+18 QuAC
Mar 28	Apps 2: Machine Translation [4pp]	Eisenstein 18.1-18.2, 18.4 Michael Collins IBM Models 1+2 JHU slides History of MT SennrichZhang19 Low-resource Aji+20 Transfer Liu+20 mBART Kocmi+23 LLMs for Eval
April 2	Apps 3: Language and Code [4pp]	ZettlemoyerCollins05 Berant+13 JiaLiang16 Recomb Wei+20 Type Inference Wang+21 CodeT5 Chen+21 Codex
April 4	Efficiency in LLMs: Decoding, Pruning, Training [4pp]	Levaiathan+23 Speculative Medusa Heads (blog) Dao+23 Flash Attention Xia+23 Sheared Llama Sanh+19 DistilBERT Hsieh+23 Distill Step-by-Step Zakan+22 BitFit Hu+21 LoRA Dettmers+22 LLM.int8() Dettmers+23 QLoRA	Check-ins due
April 9	Language grounding [4pp]	Radford+21 CLIP Ahn+22 SayCan Driess+23 PaLM-E
April 11	Morphology / LLM Safety [4pp]	Shen+23 Jailbreaking Zou+23 Attacks on LLMs EldanRussinovich23 Unlearning Mitchell+22 Model Editing Onoe+23 Challenges in Propagating
April 16	LLMs, Society, and Ethics of NLP [4pp]	HovySpruit16 Social Impact of NLP Zhao+17 Bias Amplification Rudinger+18 Gender Bias in Coref BenderGebru+21 Stochastic Parrots Gebru+18 Datasheets for Datasets Raji+20 Auditing
April 18	No class (MLL Symposium)
April 23	FP presentations 1 [4pp]
April 25	FP presentations 2 [4pp]		FP due May 3