CS394R: Reinforcement Learning: Theory and Practice -- Fall 2007: Resources Page
Resources for
Reinforcement Learning: Theory and Practice
Week 0: Class Overview
Slides from 8/30:
pdf
.
Week 1: Introduction
Slides from 9/4, 9/6:
pdf
.
Week 2: Evaluative Feedback
Slides from 9/11:
pdf
.
Vermorel and Mohri:
Multi-Armed Bandit Algorithms and Empirical Evaluation
.
Rich Sutton's slides for Chapter 2:
html
.
Week 3: The Reinforcement Learning Problem
Slides from 9/18:
pdf
.
Dietterich:
The MAXQ Method for Hierarchical Reinforcement Learning
.
Jong and Stone:
State Abstraction Discovery from Irrelevant State Variables
.
Rich Sutton's slides for Chapter 3:
pdf
.
Week 4: Dynamic Programming
Slides from 9/25:
pdf
.
Email discussion on the Gambler's problem
.
A paper on
"The Complexity of solving MDPs"
(Littman, Dean, and Kaelbling, 1995).
Tumer and Agogino:
Distributed Agent-Based Air Traffic Flow Management
.
Pashenkova, Rish, and Dechter:
Value Iteration and Policy Iteration Algorithms for Markov Decision Problems
.
Rich Sutton's slides for Chapter 4:
html
.
Week 5: Monte Carlo Methods
Slides from 10/2:
pdf
.
A paper that
addresses relationship between first-visit and every-visit MC
(Singh and Sutton, 1996). For some theoretical relationships see section starting at section 3.3 (and referenced appendices).
Rich Sutton's slides for Chapter 5:
html
.
Week 6: Temporal Difference Learning
Slides from 10/9:
pdf
.
A couple of articles on the details of actor-critic in practice by
Tsitsklis
and by
Williams
.
Sprague and Ballard:
Multiple-Goal Reinforcement Learning with Modular Sarsa(0)
.
Rich Sutton's slides for Chapter 6:
html
.
Week 7: Eligibility Traces
Slides from 10/16:
pdf
.
The equivalence of MC and first visit TD(1) is proven in the
same Singh and Sutton paper that's referenced above
(Singh and Sutton, 1996). See starting at Section 2.4.
Dayan:
The Convergence of TD(&lambda) for General &lambda
.
Rich Sutton's slides for Chapter 7:
html
.
Week 8: Generalization and Function Approximation
Slides from 10/23:
pdf
.
Dopamine: generalization and Bonuses
(2002) Kakade and Dayan.
Andrew Smith's
Applications of the Self-Organising Map to Reinforcement Learning
Bernd Fritzke's very clear
Some Competitive Learning Methods
DemoGNG
- a nice visual demo of competitive learning
Residual Algorithms: Reinforcement Learning with Function Approximation
(1995) Leemon Baird. More on the Baird counterexample as well as an alternative to doing gradient descent on the MSE.
Boyan, J. A., and A. W. Moore,
Generalization in Reinforcement Learning: Safely Approximating the Value Function.
In Tesauro, G., D. S. Touretzky, and T. K. Leen (eds.), Advances in Neural Information Processing Systems 7 (NIPS). MIT Press, 1995. Another example of function approximation divergence and a proposed solution.
Experiments with Reinforcement Learning in Problems with Continuous State and Action Spaces
(1998) Juan Carlos Santamaria, Richard S. Sutton, Ashwin Ram. Comparisons of several types of function approximators (including instance-based like Kanerva).
Least-Squares Temporal Difference Learning
Justin Boyan.
A Convergent Form of Approximate Policy Iteration
(2002) T. J. Perkins and D. Precup. A new convergence guarantee with function approximation.
On-line calculators of t-tests
Slides on
Decision Trees
from Tom Mitchell's book
Machine Learning
Moore and Atkeson:
The Parti-game Algorithm for Variable Resolution Reinforcement Learning in Multidimensional State Spaces
.
Sherstov and Stone:
Function Approximation via Tile Coding: Automating Parameter Choice
.
Chapman and Kaelbling:
Input Generalization in Delayed Reinforcement Learning: An Algorithm and Performance Comparisons
.
Rich Sutton's slides for Chapter 8:
html
.
Week 9: Planning and Learning
Ng
et al.
:
Autonomous helicopter flight via reinforement learning
.
Szita and Lörincz:
Learning Tetris Using the Noisy Cross-Entropy Method
.
Kalyanakishnan
et al.
:
Model-based Reinforcement Learning in a Complex Domain
.
Strehl
et al.
:
PAC Model-Free Reinforcement Learning
.
Kearns and Singh:
Near-Optimal Reinforcement Learning in Polynomial Time
.
Rich Sutton's slides for Chapter 9:
html
.
Week 10: Case Studies
Slides from 11/6:
pdf
.
Doran's
discussion slides
and a related source:
Leonid Kuvayev's Masters Thesis
.
Zhang and Dietterich's
job-shop scheduling paper
.
University of Michigan's
successes of RL page
Tony Cassandra's
POMDP for Dummies
Michael Littman's
POMDP information page
ICML 2004 workshop on
relational RL
Sašo Džeroski, Luc De Raedt and Kurt Driessens:
Relational Reinforcement Learning
.
Week 11: Abstraction: Options and Hierarchy
Slides from 11/13:
pdf
.
Slides from 11/15:
pdf
.
Sasha Sherstov's 2004 slides on
option discovery
.
A page devoted to
option discovery
Improved Automatic Discovery of Subgoals for Options in Hierarchical Reinforcement Learning
by Kretchmar et al.
The
Journal version of the MaxQ paper
A follow-up paper on liminating irrelevant variables within a subtask:
State Abstraction in MAXQ Hierarchical Reinforcement Learning
Tom Dietterich's
tutorial on abstraction
.
Nick Jong's paper on
state abstraction discovery
.
The slides
.
Week 12: Helicopter Control and Robot Soccer
Slides from 11/20:
pdf
.
The original
PEGASUS paper
.
Some other papers on helicopter control and soocer:
Autonomous Helicopter Control using Reinforcement Learning Policy Search Methods
.
J. Bagnell
and J. Schneider
Proceedings of the International Conference on Robotics and Automation 2001, IEEE, May, 2001.
Scaling Reinforcement Learning toward RoboCup Soccer
.
Peter Stone
and Richard S. Sutton.
Proceedings of the Eighteenth International Conference on Machine Learning, pp. 537-544, Morgan Kaufmann, San Francisco, CA, 2001.
The
UT Austin Villa
RoboCup team home page.
Greg Kuhlmann's follow-up on
progress in 3v2 keepaway
Reinforcement Learning for Sensing Strategies
.
C. Kwok and
D. Fox
.
Proceedings of IROS, 2004.
Learning from Observation and Practice Using Primitives
.
Darrin Bentivegna
, Christopher Atkeson, and Gordon Cheng.
AAAI Fall Symposium on Real Life Reinforcement Learning, 2004.
Week 13: Adaptive Representations and Transfer Learning
Kenneth Stanley and Risto Miikkulainen:
Efficient Evolution of Neural Network Topologies
.
Week 14: Advice and Multiagent Reinforcement Learning
Slides from 12/4:
pdf
.
The keepaway ones
.
Slides from 12/6:
pdf
.
The pursuit domain ones
.
PILLAR
Pengo
A nice reading list on more advanced
multiagent RL
.
Gregory Kuhlmann, Peter Stone, Raymond Mooney, and Jude Shavlik:
Guiding a Reinforcement Learner with Natural Language Advice: Initial Results in RoboCup Soccer
.
Sonia Chernova and Manuela Veloso:
Confidence-Based Policy Learning from Demonstration Using Gaussian Mixture Models
.
[
Back to Department Homepage
]
Page maintained by
Peter Stone
Questions? Send me
mail