Instructors: Amy Zhang
and Peter Stone
Departments of Computer Science and Electrical and Computing Engineering
Tuesday, Thursday 9:30-11:00am
GDC 2.216
Amy Zhang
office hours: Wednesdays 2pm-3pm and by appointment
office: EER 6.878
email: amyzhang@cs.utexas.edu
Peter Stone
office hours: Varies week to week (see discussion board) and by appointment
office: GDC 3.508
phone: 471-9796
fax: 471-8885
email: pstone@cs.utexas.edu
Caroline Wang
office hours: Mon 5-6pm - cancelled 2/26
Zoom
email: caroline.l.wang@utexas.edu
Haoran Xu
office hours: Thurs 11am-12pm
Zoom
email: haoran.xu@utexas.edu
Shuozhe Li
office hours: Fri 4-5pm
Zoom
email: shuozhe.li@utexas.edu
Siddhant Agarwal
office hours: Thurs 5-6pm
Zoom
email: siddhant@cs.utexas.edu
Michael Munje
office hours: Mon 1-2pm
location: GDC 3.504B
email: michaelmunje@utexas.edu
"Reinforcement learning problems involve learning what to do --- how to map situations to actions --- so as to maximize a numerical reward signal. In an essential way these are closed-loop problems because the learning system's actions in uence its later inputs. Moreover, the learner is not told which actions to take, as in many forms of machine learning, but instead must discover which actions yield the most reward by trying them out. In the most interesting and challenging cases, actions may affect not only the immediate reward but also the next situation and, through that, all subsequent rewards. These three characteristics --- being closed-loop in an essential way, not having direct instructions as to what actions to take, and where the consequences of actions, including reward signals, play out over extended time periods --- are the three most important distinguishing features of the reinforcement learning problem."
These two paragraphs from chapter 1 of the course textbook describe the topic of this course. The course is a graduate level class. There will be assigned readings and class discussions and activities. It will cover the first 13 chapters of the (2nd edition of the) course textbook plus Chapter 16. Beyond that, we will move to more advanced and/or recent readings from the field with an aim towards focussing on the practical successes and challenges relating to reinforcement learning.
In addition to the readings, there will be one exam, some problem sets, programming assignments, and a final project. Students will be expected to be proficient programmers.
Strong programming skills and knowledge of probability are required. Some background in artificial intelligence is recommended.
Reading, written, and programming assignments will be updated on the
assignments page. A tentative schedule for the entire semester is posted. But the readings and exercises may change up until the Wednesday of the week before they are due (1 week in advance).
Slides from class and other relevant links and information are on the resources page. If you find something that should be added there, please email it to the instructors and/or TAs.
While the Professors and the TA would be glad to answer any questions you have, you would frequently find your peers to be an equally important resource in this class.
Grades will be based on:
These responses will be graded on a 10-point scale with a grade of 10 being a typical full-credit grade for a reasonable response. Responses will be due by 2pm on Monday. No late responses will be accepted.
Reponses must be original. They may not be copied verbatim from the textbook or anywhere else, nor should they be generated by AI tools. You may use such tools to improve the language of your responses, but if you do, please include your original words beneath your revised submission.
This deadline is designed both to encourage you to do the readings before class and also to allow us to incorporate some of your responses into the class discussions.
Students are expected to be present in class having completed the readings and participate actively in the discussions and activities.
If you turn in your assignment late, expect points to be deducted. No exceptions will be made for the written responses to readings-based questions (subject to the ``notice about missed work due to religious holy days'' below). For other assignments, TBA.
The greater the advance notice of a need for an extension, the greater the likelihood of leniency.
You are encouraged to discuss the readings and concepts with classmates. But all written work must be your own. And programming assignments must be your own except for 2-person teams when teams are authorized. All work ideas, quotes, and code fragments that originate from elsewhere must be cited according to standard academic practice. Students caught cheating will automatically fail the course. If in doubt, look at the departmental guidelines and/or ask.
Please remember that the ideas presented in the reading response must be your own. AI Tools such as ChatGPT, Bard, and similar offerings can be used for polishing only. If you used the tool, in the response, below the polished version, include the initial (raw) version of your response and acknowledge explicitly what tool(s) you used to improve your language and how. So there is a trail we can verify if there is ambiguity w.r.t. originality.
The University of Texas at Austin provides upon request appropriate academic accommodations for qualified students with disabilities. To determine if you qualify, please contact the Dean of Students at 471-6529; 471-4641 TTY. If they certify your needs, I will work with you to make appropriate arrangements.
A student who misses an examination, work assignment, or other project due to the observance of a religious holy day will be given an opportunity to complete the work missed within a reasonable time after the absence, provided that he or she has properly notified the instructor. It is the policy of the University of Texas at Austin that the student must notify the instructor at least fourteen days prior to the classes scheduled on dates he or she will be absent to observe a religious holy day. For religious holy days that fall within the first two weeks of the semester, the notice should be given on the first day of the semester. The student will not be penalized for these excused absences, but the instructor may appropriately respond if the student fails to complete satisfactorily the missed assignment or examination within a reasonable time after the excused absence.
Page maintained by
Peter Stone
Questions? Send me
mail