Course Specifications for
CS 388 Natural Language Processing
-
When & Where: Spring 2018, Mon & Wed, 2:00 p.m.-3:30 p.m., GDC 4.302
-
Unique Number: 51745
-
Professor: Raymond Mooney,
GDC
3.512, 471-9558, mooney@cs.utexas.edu
-
Office Hours: Mon & Wed, 11am-noon, GDC 3.512
-
Teaching Assistant (TA):
Ghufran Baig, ghufran@cs.utexas.edu
-
TA Office Hours:
Tues 1-2 pm;
Thurs 2-3 pm, GDC 6.802A
-
Online Class Discussion:
We will use Piazza for a class discussion
forum. Please make sure you are enrolled for this class on Piazza and post questions to this discussion board.
-
Prerequisites: Basic knowledge of formal-language/automata
theory (i.e. regular and context-free grammars), artificial intelligence
(i.e. search, logic, and knowledge representation), and Java and Python Programming.
Knowledge of machine learning will be extremely useful but not strictly
necessary.
-
Textbook :
Jurafsky and Martin,
SPEECH and LANGUAGE PROCESSING: An Introduction to Natural Language
Processing, Computational Linguistics, and Speech Recognition, Second
Edition, McGraw Hill, 2008.
-
Recommended Supplementary Text:
Manning and Schütze,
Foundations of Statistical Natural Language Processing,
MIT Press. Cambridge, MA: May 1999.
Course Overview
The intent of the course is to present a fairly broad graduate-level
introduction to Natural Language Processing (NLP, a.k.a. comptuational
linguistics), the study of computing systems that can process, understand, or
communicate in human language. The primary focus of the course will be on
understanding various NLP tasks as listed on the course
syllabus, algorithms for effectively solving these problems, and methods
for evaluating their performance. There will be a focus on statistical
and neural-network learning algorithms that train on (annotated) text corpora to automatically
acquire the knowledge needed to perform the task. Class lectures will discuss
general issues as well as present abstract algorithms. Implemented versions of
some of the algorithms will be provided in order to give a feel for how the
systems discussed in class "really work" and allow for extensions and
experimentation as part of the course projects.
Course Requirements and Grading
Chapters from the text and a few other readings will be assigned throughout the
semester, and the reading should be done before the corresponding class.
Copies of the class lecture slides (in Powerpoint) will be available on the course home page. There will be about four homework
assignments, a midterm exam, and a final research project.
To encourage and evaluate class participation, at the end of each week, each
student should electronically submit on Canvas a short insightful question or
comment about that week's lectures and/or reading. These are due the Saturday
after each full class week and simply graded as "not submited or bad" (0), "OK"
(1), "good" (2), or "very good" (3). The first class of the next week I will
discuss a selection of the "good"/"very good" questions. This should not at all
discourage questions during class, in fact, you are encouraged to submit a
question you already asked in class that week. This just gives you an
additional chance to think of a good question off-line.
The homework assignments will involve some programming involving using and
building upon existing NLP software packages, and running comptuational
experiments to evaluate and analyze these systems. Programming assignments
will be in Java, Python and/or TensorFlow. If you do not know these languages, you will need to learn them on your
own. You can use your student account on the department workstations or any
other platform available to you (however, we will only provide support for
running on departmental Unix machines). If you are not a CS student and need a
temporary department account, apply on the web here.
The midterm exam, scheduled during class on Wed March 7, 2018 will
consist of a mix of problem solving and short answer questions covering the
material in the first half of the course. For an example, see last year's
midterm on the course home page.
The final project can be a more ambitious experiment or enhancement involving
an existing NLP system or a new system implementation (in the programming
language of your choice). In either case, the implementation and/or
experiments should be accompanied by a short paper (about 6 to 7 single-spaced
pages) describing the project. An outline for the project report is
available here and on
the course home page. About a month in advance, you
will be asked to submit a one-page project proposal.
Late Submission and Cheating Policies
Homework assignments should be completed independently by each student
and any program code should always be appropriately commented and the report
nicely formated, using well-designed graphics (graphs, bar charts, etc.) were
appropriate. Assignments are due at the beginning of class on the due date. In
order to leave time to get to class on time, the deadline for on-line
submissions is 15 minutes prior to the start of class. Be sure to hand in
assignments on time, late penalties are a loss of a percentage of the original
overall points for the assignment: 1 Day: 15%, 2 Days: 40%; 3 Days: 75%; past 3
days: 100%. A day is a 24 hour period starting at the beginning of class and
includes all weekend days and holidays.
The very strong preference is for team final projects from pairs of 2 students;
however, projects done by 1 or 3 students are possible on rare ocassions with
prior approval of the instructor.
Read the department's
academic policy page. Students who demonstrably violate the Academic
Honesty policy will receive a failing grade in the class. We will be using the
Moss system to screen
submited programs for plagiarism. Over the years, I unfortunately had to fail
over twenty students for copying on programming assignments. To avoid
problems, limit any discussion of assignments with other students to
clarification of the requirements or definitions of the problems, or to
understanding the existing programs or general course material. Never discuss
issues directly relevant to problem solutions.
Final Grade
The final grade will be computed as follows:
36% Homeworks
24% Midterm Exam
33% Final Project
7% Class Participation (Questions of the Week)