CS394R: Reinforcement Learning: Theory and Practice -- Spring 2011: Assignments Page

Assignments for Reinforcement Learning: Theory and Practice

Week 1 (1/18,20): Class Overview, Introduction

Chapter 1 of the textbook (due Thursday)

For each reading, be sure to submit a question or comment about the reading by 9pm on the day before class as an email in plain ascii text. I prefer that is be sent in the body of the email, rather than as an attachment. Please use the subject line "class readings for [due date]" and send to Peter and Doran (pstone@cs and doran.chakraborty@gmail). Please include your name in the response. And if you refer explicitly to the reading, please include page numbers.

Week 2 (1/25,27): Evaluative Feedback

Jump to the resources page.

Chapter 2 of the textbook (due Tuesday)

Week 3 (2/1,3): The Reinforcement Learning Problem

Jump to the resources page.

Chapter 3 of the textbook (due Tuesday)

Week 4 (2/8,10): Dynamic Programming

Jump to the resources page.

Chapter 4 of the textbook (due Tuesday)

Week 5 (2/15,17): Monte Carlo Methods

Jump to the resources page.

Chapter 5 of the textbook (due Tuesday)

Week 6 (2/22,24): Temporal Difference Learning

Jump to the resources page.

Chapter 6 of the textbook (due Tuesday)

Week 7 (3/1,3): Eligibility Traces

Jump to the resources page.

Chapter 7 of the textbook (due Tuesday)

Week 8 (3/8,10): Generalization and Function Approximation

Jump to the resources page.

Chapter 8 of the textbook (due Tuesday)

Class project proposal due at 12:30pm on Thursday. Please send an email with subject "Project Proposal" with a proposed topic for your class project. I anticipate projects taking one of two forms.

Practice (preferred): An implemenation of RL in some domain of your choice - ideally one that you are using for research or in some other class. In this case, please describe the domain and your initial plans on how you intend to implement learning. What will the states and actions be? What algorithm(s) do you expect will be most effective?

Theory: A proposal, implementation and testing of an algorithmic modification to an RL algorithm presented in the book. In this case, please describe the modification you propose to investigate and on what type of domain (possibly a toy domain) it is likely to show an improvement over things considered in the book.

Week 9 (3/22,24): Planning and Learning

Jump to the resources page.

Chapter 9 of the textbook (due Tuesday)

Week 10 (3/29,31): Game Playing

Jump to the resources page.

Due Tuesday:

Tesauro, G., Temporal Difference Learning and TD-Gammon . Communication of the ACM, 1995

Pollack, J.B., & Blair, A.D. Co-evolution in the successful learning of backgammon strategy. Machine Learning, 1998

Tesauro, G. Comments on Co-Evolution in the Successful Learning of Backgammon Strategy. Machine Learning, 1998.

Due Thursday:

Bandit based Monte-Carlo Planning Levente Kocsis , Csaba Szepesvari In: ECML-06. Number 4212 in LNCS

S. Gelly and D. Silver. Achieving Master-Level Play in 9x9 Computer Go. In Proceedings of the 23rd Conference on Artificial Intelligence, Nectar Track (AAAI-08), 2008. Also available from here.

Week 11 (4/5,7): Efficient model-based learning

Jump to the resources page.

Due Tuesday:

R-Max - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning
Ronen Brafman and Moshe Tenenholtz
The Journal of Machine Learning Research

Due Thursday:

Efficient Structure Learning in Factored-state MDPs
Alexander L. Strehl, Carlos Diuk, and Michael L. Littman
AAAI'2007

Week 12 (4/12,19): Abstraction: Options and Hierarchy

Jump to the resources page.

Due Tuesday:

Between MDPs and semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning.
Sutton, R.S., Precup, D., Singh, S.
Artificial Intelligence 112:181-211, 1999.

Due Thursday:

The MAXQ Method for Hierarchical Reinforcement Learning.
Thomas G. Dietterich
Proceedings of the 15th International Conference on Machine Learning, 1998.

Week 13 (4/19,21): Robotics applications

Jump to the resources page.

Due Tuesday:

Policy Gradient Reinforcement Learning for Fast Quadrupedal Locomotion.
Nate Kohl and Peter Stone
In Proceedings of the IEEE International Conference on Robotics and Automation, May 2004.

Making a Robot Learn to Play Soccer Using Reward and Punishment.
Heiko Müller, Martin Lauer, Roland Hafner, Sascha Lange, Artur Merke and Martin Riedmiller.
30th Annual German Conference on AI, KI 2007.

Due Thursday:

Autonomous helicopter flight via reinforcement learning.
Andrew Ng, H. Jin Kim, Michael Jordan and Shankar Sastry.
In S. Thrun, L. Saul, and B. Schoelkopf (Eds.), Advances in Neural Information Processing Systems (NIPS) 17, 2004.

Week 14 (4/26,28): Least squares methods

Jump to the resources page.

due Tuesday:

Technical update: Least-squares temporal difference learning Justin A. Boyan

Model-Free Least-Squares Policy Iteration Michail G. Lagoudakis and Ronald Parr Proceedings of NIPS*2001: Neural Information Processing Systems: Natural and Synthetic Vancouver, BC, December 2001, pp. 1547-1554.

Week 15 (5/3,5): Multiagent RL

Jump to the resources page.

due Tuesday:

Kok, J.R. and Vlassis, N., Collaborative multiagent reinforcement learning by payoff propagation, The Journal of Machine Learning Research, 7, 1828, 2006.

due Thursday:

Michael Littman, Markov Games as a Framework for Multi-Agent Reinforcement Learning, ICML, 1994.

Final Project: due at 12:30pm on Thursday, 5/5

[Back to Department Homepage]

Page maintained by Peter Stone
Questions? Send me mail