CS394R and ECE381V: Reinforcement Learning: Theory and Practice -- Spring 2024

CS394R/ECE381V: Reinforcement Learning: Theory and Practice -- Spring 2024

Instructors: Amy Zhang and Peter Stone
Departments of Computer Science and Electrical and Computing Engineering

Tuesday, Thursday 9:30-11:00am
GDC 2.216

This course has a large capacity, so there should be room for students in other departments to register once the semester is underway.
If you're interested in taking the course, please plan on coming to the initial class sessions.

IMPORTANT:

The first reading assignment is due by 5pm on Wednesday 1/17.

Please register for the course on edX Edge. You may need to sign up for an edX Edge account (note that this is not the same as an edX account).

Please check that you can access the course Canvas.

Jump to the assignments page. Jump to the resources page. Jump to the textbook page. Jump to the project page.

To leave feedback for the instructors (anonymously or otherwise), use the course evaluation survey.

Instructor Contact Information

Amy Zhang
office hours: Wednesdays 2pm-3pm and by appointment
office: EER 6.878
email: amyzhang@cs.utexas.edu

Peter Stone
office hours: Varies week to week (see discussion board) and by appointment
office: GDC 3.508
phone: 471-9796
fax: 471-8885
email: pstone@cs.utexas.edu

Teaching Assistants

Caroline Wang
office hours: Mon 5-6pm - cancelled 2/26
Zoom
email: caroline.l.wang@utexas.edu

Haoran Xu
office hours: Thurs 11am-12pm
Zoom
email: haoran.xu@utexas.edu

Shuozhe Li
office hours: Fri 4-5pm
Zoom
email: shuozhe.li@utexas.edu

Siddhant Agarwal
office hours: Thurs 5-6pm
Zoom
email: siddhant@cs.utexas.edu

Michael Munje
office hours: Mon 1-2pm
location: GDC 3.504B
email: michaelmunje@utexas.edu

Course Description

"The idea that we learn by interacting with our environment is probably the first to occur to us when we think about the nature of learning. When an infant plays, waves its arms, or looks about, it has no explicit teacher, but it does have a direct sensori-motor connection to its environment. Exercising this connection produces a wealth of information about cause and effect, about the consequences of actions, and about what to do in order to achieve goals. Throughout our lives, such interactions are undoubtedly a major source of knowledge about our environment and ourselves. Whether we are learning to drive a car or to hold a conversation, we are all acutely aware of how our environment responds to what we do, and we seek to influence what happens through our behavior. Learning from interaction is a foundational idea underlying nearly all theories of learning and intelligence."

"Reinforcement learning problems involve learning what to do --- how to map situations to actions --- so as to maximize a numerical reward signal. In an essential way these are closed-loop problems because the learning system's actions in uence its later inputs. Moreover, the learner is not told which actions to take, as in many forms of machine learning, but instead must discover which actions yield the most reward by trying them out. In the most interesting and challenging cases, actions may affect not only the immediate reward but also the next situation and, through that, all subsequent rewards. These three characteristics --- being closed-loop in an essential way, not having direct instructions as to what actions to take, and where the consequences of actions, including reward signals, play out over extended time periods --- are the three most important distinguishing features of the reinforcement learning problem."

These two paragraphs from chapter 1 of the course textbook describe the topic of this course. The course is a graduate level class. There will be assigned readings and class discussions and activities. It will cover the first 13 chapters of the (2nd edition of the) course textbook plus Chapter 16. Beyond that, we will move to more advanced and/or recent readings from the field with an aim towards focussing on the practical successes and challenges relating to reinforcement learning.

In addition to the readings, there will be one exam, some problem sets, programming assignments, and a final project. Students will be expected to be proficient programmers.

Prerequisites

Strong programming skills and knowledge of probability are required. Some background in artificial intelligence is recommended.

Text

The course textbook is:
Reinforcement Learning: An Introduction.
By Richard S. Sutton and Andrew G. Barto.
MIT Press, Cambridge, MA, 2018.
Note that the book is available on-line, though if you take the course, it's probably a book you'll want for your bookshelf.

Assignments

Reading, written, and programming assignments will be updated on the assignments page. A tentative schedule for the entire semester is posted. But the readings and exercises may change up until the Wednesday of the week before they are due (1 week in advance).

Resources

Slides from class and other relevant links and information are on the resources page. If you find something that should be added there, please email it to the instructors and/or TAs.

Discussion Forum

While the Professors and the TA would be glad to answer any questions you have, you would frequently find your peers to be an equally important resource in this class.

Course Requirements

Grades will be based on:

Written responses to the readings and other class participation (10%):

By 2pm on the afternoon before a class with a new reading assignment due, everyone must submit a brief question or comment about the readings in the Readings Response section on the edX course page. Please include your name and eid in the response. In some cases, specific questions may be posted along with the readings. But in general, it is free form. Credit will be based on evidence that you have done the readings carefully. Acceptable responses include (but are not limited to):

Insightful questions;

Clarification questions about ambiguities;

Comments about the relation of the reading to previous readings;

Solutions to problems or exercises posed in the readings;

Critiques;

Thoughts on what you would like to learn about in more detail;

Possible extensions or related studies;

Thoughts on the paper's importance; and

Summaries of the most important things you learned.

Example successful responses from a previous class are available on the sample responses page.

These responses will be graded on a 10-point scale with a grade of 10 being a typical full-credit grade for a reasonable response. Responses will be due by 2pm on Monday. No late responses will be accepted.

Reponses must be original. They may not be copied verbatim from the textbook or anywhere else, nor should they be generated by AI tools. You may use such tools to improve the language of your responses, but if you do, please include your original words beneath your revised submission.

This deadline is designed both to encourage you to do the readings before class and also to allow us to incorporate some of your responses into the class discussions.

Students are expected to be present in class having completed the readings and participate actively in the discussions and activities.

Multiple choice and short answer exercises (15%):

There will be a series of multiple choice and/or short answer questions on EdX to complete during the first 8 weeks of the semester. You will be able to submit these multiple times until you get the right answer, so they will essentially become a completion grade. But the midterm will be a similar format to these questions (though somewhat more difficult), so you will benefit from completing them carefully and making sure you fully understand the answers.

We will not be granting extensions for these written assignments, but it is always possible to do the assignments late up to the end of the semester for 80% of the credit. Please submit late assignments using the corresponding LATE submission assignment on EdX.

Programming exercises (25%):

Each student will be required to complete a series of minor programming assignments. These exercises will not involve extensive or elaborate programs. The emphasis is to be on empirically analyzing various learning algorithms and reporting on the results. They will be auto-graded. Details are on the class EdX page.
For the programming assignments, students may not use any example code found on the web or from any other source, especially for the concepts that are being covered by the assignment. If there is general purpose "plumbing" code that you would like to use, please check first with the course staff.

Just like the short answer exercises, we will not be granting extensions for programming assignments, but it is always possible to do the assignments late up to the end of the semester for 80% of the credit. Please submit late programming assignments using the corresponding LATE submission assignment on EdX.

Midterm Exam (20%):

There will be a midterm exam during week 9 of the semester, covering the material from the textbook.

Final Project (30%):

There will be a final project due by TBA at 11:59pm . For each day late, 1 point out of 100 will be deducted from the project grade.

Extension Policy

If you turn in your assignment late, expect points to be deducted. No exceptions will be made for the written responses to readings-based questions (subject to the ``notice about missed work due to religious holy days'' below). For other assignments, TBA.

The greater the advance notice of a need for an extension, the greater the likelihood of leniency.

Academic Dishonesty Policy

You are encouraged to discuss the readings and concepts with classmates. But all written work must be your own. And programming assignments must be your own except for 2-person teams when teams are authorized. All work ideas, quotes, and code fragments that originate from elsewhere must be cited according to standard academic practice. Students caught cheating will automatically fail the course. If in doubt, look at the departmental guidelines and/or ask.

LLM Policy

Please remember that the ideas presented in the reading response must be your own. AI Tools such as ChatGPT, Bard, and similar offerings can be used for polishing only. If you used the tool, in the response, below the polished version, include the initial (raw) version of your response and acknowledge explicitly what tool(s) you used to improve your language and how. So there is a trail we can verify if there is ambiguity w.r.t. originality.

Notice about students with disabilities

The University of Texas at Austin provides upon request appropriate academic accommodations for qualified students with disabilities. To determine if you qualify, please contact the Dean of Students at 471-6529; 471-4641 TTY. If they certify your needs, I will work with you to make appropriate arrangements.

Notice about missed work due to religious holy days

A student who misses an examination, work assignment, or other project due to the observance of a religious holy day will be given an opportunity to complete the work missed within a reasonable time after the absence, provided that he or she has properly notified the instructor. It is the policy of the University of Texas at Austin that the student must notify the instructor at least fourteen days prior to the classes scheduled on dates he or she will be absent to observe a religious holy day. For religious holy days that fall within the first two weeks of the semester, the notice should be given on the first day of the semester. The student will not be penalized for these excused absences, but the instructor may appropriately respond if the student fails to complete satisfactorily the missed assignment or examination within a reasonable time after the excused absence.

Related Courses

Previous offering of this course (Spring 2022)
The one before that (Fall 2019)
The one before that (Fall 2016)
The one before that (Spring 2013)
The one before that (Spring 2011)
The one before that (Fall 2007)
The one before that (Fall 2004)
Chi Jin's course at Princeton
Emma Brunskill's course at Stanford
Michael Littman's course at Brown
Sergey Levine's course at Berkeley
David Silver's course at UCL
Yishay Mansour's course at Tel Aviv University

UTCS Reinforcement Learning Reading Group

The UTCS Reinforcement Learning Reading Group is a student run group that meets bi-weekly to discuss papers related to reinforcement learning. The RL Reading Group web page also provides a repository of past readings.
Here's An RL reading list from Shivaram Kalyanakrishnan.

[Back to Department Homepage]

Page maintained by Peter Stone
Questions? Send me mail