Instructors: Scott Niekum
and Peter Stone
Department of Computer Science
Tuesday, Thursday 9:30-11:00am
GDC 2.216
Scott Niekum
office hours: Tuesdays 3pm-4pm and by appointment
office: GDC 3.404
phone: 232-74741
fax: 471-8885
email: sniekum@cs.utexas.edu
Peter Stone
office hours: Tuesdays 1:10pm-2pm and by appointment
office: GDC 3.508
phone: 471-9796
fax: 471-8885
email: pstone@cs.utexas.edu
William Macke
office hours: Thursday 3-4 PM and Friday 1-2PM CST
email: wmacke@utexas.edu
Zoom Link
Harshit Sikchi
office hours: Wednesday 3-4 PM CST (on zoom)
office: GDC 3.408B
email: hsikchi@utexas.edu
Zoom Link
Rohan Nair
office hours: Monday and Wednesday 2-3 PM CST
email: rohan.nair@utexas.edu
Zoom Link
Zizhao Wang
office hours: Thursday 2-3 PM CST (on zoom)
email: zizhao.wang@utexas.edu
Zoom Link
"Reinforcement learning problems involve learning what to do --- how to map situations to actions --- so as to maximize a numerical reward signal. In an essential way these are closed-loop problems because the learning system's actions in uence its later inputs. Moreover, the learner is not told which actions to take, as in many forms of machine learning, but instead must discover which actions yield the most reward by trying them out. In the most interesting and challenging cases, actions may affect not only the immediate reward but also the next situation and, through that, all subsequent rewards. These three characteristics --- being closed-loop in an essential way, not having direct instructions as to what actions to take, and where the consequences of actions, including reward signals, play out over extended time periods --- are the three most important distinguishing features of the reinforcement learning problem."
These two paragraphs from chapter 1 of the course textbook describe the topic of this course. The course is a graduate level class. There will be assigned readings and class discussions and activities. It will cover the first 13 chapters of the (2nd edition of the) course textbook plus Chapter 16. Beyond that, we will move to more advanced and/or recent readings from the field with an aim towards focussing on the practical successes and challenges relating to reinforcement learning.
In addition to the readings, there will be one exam, some problem sets, programming assignments, and a final project. Students will be expected to be proficient programmers.
Strong programming skills and knowledge of probability are required. Some background in artificial intelligence is recommended.
Reading, written, and programming assignments will be updated on the
assignments page. A tentative schedule for the entire semester is posted. But the readings and exercises may change up until the Wednesday of the week before they are due (1 week in advance).
Slides from class and other relevant links and information are on the resources page. If you find something that should be added there, please email it to the instructors and/or TAs.
While the Professor and the TA would be glad to answer any questions you have, you would frequently find your peers to be an equally important resource in this class.
Please subscribe to our class piazza page.
Grades will be based on:
These responses will be graded on a 10-point scale with a grade of 10 being a typical full-credit grade for a reasonable response. Responses will be due by 5pm on Monday. No late responses will be accepted.
This deadline is designed both to encourage you to do the readings before class and also to allow us to incorporate some of your responses into the class discussions.
Students are expected to be present in class having completed the readings and participate actively in the discussions and activities.
If you turn in your assignment late, expect points to be deducted. No exceptions will be made for the written responses to readings-based questions (subject to the ``notice about missed work due to religious holy days'' below). For other assignments, TBA.
The greater the advance notice of a need for an extension, the greater the likelihood of leniency.
You are encouraged to discuss the readings and concepts with classmates. But all written work must be your own. And programming assignments must be your own except for 2-person teams when teams are authorized. All work ideas, quotes, and code fragments that originate from elsewhere must be cited according to standard academic practice. Students caught cheating will automatically fail the course. If in doubt, look at the departmental guidelines and/or ask.
The University of Texas at Austin provides upon request appropriate academic accommodations for qualified students with disabilities. To determine if you qualify, please contact the Dean of Students at 471-6529; 471-4641 TTY. If they certify your needs, I will work with you to make appropriate arrangements.
A student who misses an examination, work assignment, or other project due to the observance of a religious holy day will be given an opportunity to complete the work missed within a reasonable time after the absence, provided that he or she has properly notified the instructor. It is the policy of the University of Texas at Austin that the student must notify the instructor at least fourteen days prior to the classes scheduled on dates he or she will be absent to observe a religious holy day. For religious holy days that fall within the first two weeks of the semester, the notice should be given on the first day of the semester. The student will not be penalized for these excused absences, but the instructor may appropriately respond if the student fails to complete satisfactorily the missed assignment or examination within a reasonable time after the excused absence.
Page maintained by
Peter Stone
Questions? Send me
mail