CS394R: Reinforcement Learning: Theory and Practice -- Fall 2007: Resources Page
Resources for
Reinforcement Learning: Theory and Practice
Week 0: Class Overview
Slides from 8/30:
Week 1: Introduction
Slides from 9/4, 9/6:
Week 2: Evaluative Feedback
Slides from 9/11:
Vermorel and Mohri:
Multi-Armed Bandit Algorithms and Empirical Evaluation
Rich Sutton's slides for Chapter 2:
Week 3: The Reinforcement Learning Problem
Slides from 9/18:
The MAXQ Method for Hierarchical Reinforcement Learning
Jong and Stone:
State Abstraction Discovery from Irrelevant State Variables
Rich Sutton's slides for Chapter 3:
Week 4: Dynamic Programming
Slides from 9/25:
Email discussion on the Gambler's problem
A paper on
"The Complexity of solving MDPs"
(Littman, Dean, and Kaelbling, 1995).
Tumer and Agogino:
Distributed Agent-Based Air Traffic Flow Management
Pashenkova, Rish, and Dechter:
Value Iteration and Policy Iteration Algorithms for Markov Decision Problems
Rich Sutton's slides for Chapter 4:
Week 5: Monte Carlo Methods
Slides from 10/2:
A paper that
addresses relationship between first-visit and every-visit MC
(Singh and Sutton, 1996). For some theoretical relationships see section starting at section 3.3 (and referenced appendices).
Rich Sutton's slides for Chapter 5:
Week 6: Temporal Difference Learning
Slides from 10/9:
A couple of articles on the details of actor-critic in practice by
and by
Sprague and Ballard:
Multiple-Goal Reinforcement Learning with Modular Sarsa(0)
Rich Sutton's slides for Chapter 6:
Week 7: Eligibility Traces
Slides from 10/16:
The equivalence of MC and first visit TD(1) is proven in the
same Singh and Sutton paper that's referenced above
(Singh and Sutton, 1996). See starting at Section 2.4.
The Convergence of TD(&lambda) for General &lambda
Rich Sutton's slides for Chapter 7:
Week 8: Generalization and Function Approximation
Slides from 10/23:
Dopamine: generalization and Bonuses
(2002) Kakade and Dayan.
Andrew Smith's
Applications of the Self-Organising Map to Reinforcement Learning
Bernd Fritzke's very clear
Some Competitive Learning Methods
- a nice visual demo of competitive learning
Residual Algorithms: Reinforcement Learning with Function Approximation
(1995) Leemon Baird. More on the Baird counterexample as well as an alternative to doing gradient descent on the MSE.
Boyan, J. A., and A. W. Moore,
Generalization in Reinforcement Learning: Safely Approximating the Value Function.
In Tesauro, G., D. S. Touretzky, and T. K. Leen (eds.), Advances in Neural Information Processing Systems 7 (NIPS). MIT Press, 1995. Another example of function approximation divergence and a proposed solution.
Experiments with Reinforcement Learning in Problems with Continuous State and Action Spaces
(1998) Juan Carlos Santamaria, Richard S. Sutton, Ashwin Ram. Comparisons of several types of function approximators (including instance-based like Kanerva).
Least-Squares Temporal Difference Learning
Justin Boyan.
A Convergent Form of Approximate Policy Iteration
(2002) T. J. Perkins and D. Precup. A new convergence guarantee with function approximation.
On-line calculators of t-tests
Slides on
Decision Trees
from Tom Mitchell's book
Machine Learning
Moore and Atkeson:
The Parti-game Algorithm for Variable Resolution Reinforcement Learning in Multidimensional State Spaces
Sherstov and Stone:
Function Approximation via Tile Coding: Automating Parameter Choice
Chapman and Kaelbling:
Input Generalization in Delayed Reinforcement Learning: An Algorithm and Performance Comparisons
Rich Sutton's slides for Chapter 8:
Week 9: Planning and Learning
et al.
Autonomous helicopter flight via reinforement learning
Szita and Lörincz:
Learning Tetris Using the Noisy Cross-Entropy Method
et al.
Model-based Reinforcement Learning in a Complex Domain
et al.
PAC Model-Free Reinforcement Learning
Kearns and Singh:
Near-Optimal Reinforcement Learning in Polynomial Time
Rich Sutton's slides for Chapter 9:
Week 10: Case Studies
Slides from 11/6:
discussion slides
and a related source:
Leonid Kuvayev's Masters Thesis
Zhang and Dietterich's
job-shop scheduling paper
University of Michigan's
successes of RL page
Tony Cassandra's
POMDP for Dummies
Michael Littman's
POMDP information page
ICML 2004 workshop on
relational RL
Sašo Džeroski, Luc De Raedt and Kurt Driessens:
Relational Reinforcement Learning
Week 11: Abstraction: Options and Hierarchy
Slides from 11/13:
Slides from 11/15:
Sasha Sherstov's 2004 slides on
option discovery
A page devoted to
option discovery
Improved Automatic Discovery of Subgoals for Options in Hierarchical Reinforcement Learning
by Kretchmar et al.
Journal version of the MaxQ paper
A follow-up paper on liminating irrelevant variables within a subtask:
State Abstraction in MAXQ Hierarchical Reinforcement Learning
Tom Dietterich's
tutorial on abstraction
Nick Jong's paper on
state abstraction discovery
The slides
Week 12: Helicopter Control and Robot Soccer
Slides from 11/20:
The original
Some other papers on helicopter control and soocer:
Autonomous Helicopter Control using Reinforcement Learning Policy Search Methods
J. Bagnell
and J. Schneider
Proceedings of the International Conference on Robotics and Automation 2001, IEEE, May, 2001.
Scaling Reinforcement Learning toward RoboCup Soccer
Peter Stone
and Richard S. Sutton.
Proceedings of the Eighteenth International Conference on Machine Learning, pp. 537-544, Morgan Kaufmann, San Francisco, CA, 2001.
UT Austin Villa
RoboCup team home page.
Greg Kuhlmann's follow-up on
progress in 3v2 keepaway
Reinforcement Learning for Sensing Strategies
C. Kwok and
D. Fox
Proceedings of IROS, 2004.
Learning from Observation and Practice Using Primitives
Darrin Bentivegna
, Christopher Atkeson, and Gordon Cheng.
AAAI Fall Symposium on Real Life Reinforcement Learning, 2004.
Week 13: Adaptive Representations and Transfer Learning
Kenneth Stanley and Risto Miikkulainen:
Efficient Evolution of Neural Network Topologies
Week 14: Advice and Multiagent Reinforcement Learning
Slides from 12/4:
The keepaway ones
Slides from 12/6:
The pursuit domain ones
A nice reading list on more advanced
multiagent RL
Gregory Kuhlmann, Peter Stone, Raymond Mooney, and Jude Shavlik:
Guiding a Reinforcement Learner with Natural Language Advice: Initial Results in RoboCup Soccer
Sonia Chernova and Manuela Veloso:
Confidence-Based Policy Learning from Demonstration Using Gaussian Mixture Models
Back to Department Homepage
Page maintained by
Peter Stone
Questions? Send me