CS394R: Reinforcement Learning: Theory and Practice -- Spring 2011: Assignments Page
Week 1 (1/18,20): Class Overview, Introduction
Jump to the resources page.
Week 2 (1/25,27): Evaluative Feedback
Jump to the resources page.
Week 3 (2/1,3): The Reinforcement Learning Problem
Jump to the resources page.
Week 4 (2/8,10): Dynamic Programming
Jump to the resources page.
Week 5 (2/15,17): Monte Carlo Methods
Jump to the resources page.
Week 6 (2/22,24): Temporal Difference Learning
Jump to the resources page.
Week 7 (3/1,3): Eligibility Traces
Jump to the resources page.
Week 8 (3/8,10): Generalization and Function Approximation
Jump to the resources page.
Week 9 (3/22,24): Planning and Learning
Jump to the resources page.
Chapter 9 of the textbook (due Tuesday)
Week 10 (3/29,31): Game Playing
Jump to the resources page.
Due Tuesday:
Tesauro, G., Temporal Difference Learning and TD-Gammon. Communication
of the ACM, 1995
Pollack, J.B., & Blair, A.D. Co-evolution in the successful
learning of
backgammon strategy. Machine Learning, 1998
Tesauro, G. Comments on Co-Evolution in the Successful Learning of
Backgammon Strategy. Machine Learning, 1998.
Due Thursday:
Bandit based Monte-Carlo Planning
Levente Kocsis , Csaba Szepesvari
In: ECML-06. Number 4212 in LNCS
S. Gelly and D. Silver. Achieving Master-Level Play in 9x9 Computer
Go. In Proceedings of the 23rd Conference on Artificial Intelligence,
Nectar Track (AAAI-08), 2008. Also available from here.
Week 11 (4/5,7): Efficient model-based learning
Jump to the resources page.
Due Tuesday:
R-Max - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning
Ronen Brafman and Moshe Tenenholtz
The Journal of Machine Learning Research
Due Thursday:
Efficient Structure Learning in Factored-state MDPs
Alexander L. Strehl, Carlos Diuk, and Michael L. Littman
AAAI'2007
Week 12 (4/12,19): Abstraction: Options and Hierarchy
Jump to the resources page.
Due Tuesday:
Between MDPs and semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning.
Sutton, R.S., Precup, D., Singh, S.
Artificial Intelligence 112:181-211, 1999.
Due Thursday:
The MAXQ Method for Hierarchical Reinforcement Learning.
Thomas G. Dietterich
Proceedings of the 15th International Conference on Machine Learning, 1998.
Week 13 (4/19,21): Robotics applications
Jump to the resources page.
Due Tuesday:
Policy Gradient Reinforcement Learning for Fast Quadrupedal Locomotion.
Nate Kohl and Peter Stone
In Proceedings of the IEEE International
Conference on Robotics and Automation, May 2004.
Making a Robot Learn to Play Soccer Using Reward and Punishment.
Heiko Müller, Martin Lauer, Roland Hafner, Sascha Lange, Artur
Merke and Martin Riedmiller.
30th Annual German Conference on AI, KI 2007.
Due Thursday:
Autonomous helicopter flight via reinforcement learning.
Andrew Ng, H. Jin Kim, Michael Jordan and Shankar Sastry.
In S. Thrun, L. Saul, and B. Schoelkopf (Eds.), Advances in Neural Information Processing Systems (NIPS) 17, 2004.
Week 14 (4/26,28): Least squares methods
Jump to the resources page.
due Tuesday:
Technical update: Least-squares temporal difference learning
Justin A. Boyan
Model-Free Least-Squares Policy Iteration
Michail G. Lagoudakis and Ronald Parr
Proceedings of NIPS*2001: Neural Information Processing Systems:
Natural and Synthetic
Vancouver, BC, December 2001, pp. 1547-1554.
Week 15 (5/3,5): Multiagent RL
Jump to the resources page.
due Tuesday:
Kok, J.R. and Vlassis, N., Collaborative multiagent reinforcement
learning by payoff propagation, The Journal of Machine Learning
Research, 7, 1828, 2006.
due Thursday:
Michael Littman, Markov Games as a Framework for Multi-Agent Reinforcement Learning, ICML, 1994.
Final Project: due at 12:30pm on Thursday, 5/5
[Back to Department Homepage]
Page maintained by
Peter Stone
Questions? Send me
mail