CS395T: Reinforcement Learning: Theory and Practice -- Fall 2004: Resources Page
Resources for
Reinforcement Learning: Theory and Practice
Week 0 (8/26): Class Overview
Slides from 8/26:
pdf
.
Week 1 (8/31,9/2): Introduction
Slides from 8/31:
pdf
.
Week 2 (9/7,9/9): Evaluative Feedback
Slides from 9/7:
pdf
.
Week 3 (9/14,16): The Reinforcement Learning Problem
Slides from 9/14:
pdf
.
Week 4 (9/21,23): Dynamic Programming
Slides from 9/21:
pdf
.
Slides from 9/23:
pdf
.
Email discussion on the Gambler's problem
.
A paper on
"The Complexity of solving MDPs"
(Littman, Dean, and Kaelbling, 1995).
Week 5 (9/28,9/30): Monte Carlo Methods
Slides from 9/28:
pdf
.
Slides from 9/30:
pdf
.
A paper that
addresses relationship between first-visit and every-visit MC
(Singh and Sutton, 1996). For some theoretical relationships see section starting at section 3.3 (and referenced appendices).
Week 6 (10/5,7): Temporal Difference Learning
Slides from 10/5:
pdf
.
Slides from 10/7:
pdf
.
A couple of articles on the details of actor-critic in practice by
Tsitsklis
and by
Williams
.
Week 7 (10/12,14): Eligibility Traces
Slides from 10/12:
pdf
.
Slides from 10/14:
pdf
.
The equivalence of MC and first visit TD(1) is proven in the
same Singh and Sutton paper that's referenced above
(Singh and Sutton, 1996). See starting at Section 2.4.
Week 8 (10/19,21): Generalization and Function Approximation
Slides from 10/19:
pdf
.
Slides from 10/21:
pdf
.
The paper Igor presented in class:
Dopamine: generalization and Bonuses
(2002) Kakade and Dayan.
Andrew Smith's
Applications of the Self-Organising Map to Reinforcement Learning
Bernd Fritzke's very clear
Some Competative Learning Methods
DemoGNG
- a nice visual demo of competative learning
Residual Algorithms: Reinforcement Learning with Function Approximation
(1995) Leemon Baird. More on the Baird counterexample as well as an alternative to doing gradient descent on the MSE.
Boyan, J. A., and A. W. Moore,
Generalization in Reinforcement Learning: Safely Approximating the Value Function.
In Tesauro, G., D. S. Touretzky, and T. K. Leen (eds.), Advances in Neural Information Processing Systems 7 (NIPS). MIT Press, 1995. Another example of function approximation divergence and a proposed solution.
Experiments with Reinforcement Learning in Problems with Continuous State and Action Spaces
(1998) Juan Carlos Santamaria, Richard S. Sutton, Ashwin Ram. Comparisons of several types of function approximators (including instance-based like Kanerva).
Least-Squares Temporal Difference Learning
Justin Boyan.
A Convergent Form of Approximate Policy Iteration
(2002) T. J. Perkins and D. Precup. A new convergence guarantee with function approximation.
Week 9 (10/26,28): Planning and Learning
Slides from 10/26:
pdf
.
The planning ones
.
Slides from 10/28:
pdf
.
Week 10 (11/2,4): Case Studies
Slides from 11/2:
pdf
.
Slides from 11/4:
pdf
.
ICML 2004 workshop on
relational RL
Tony Cassandra's
POMDP for Dummies
Michael Littman's
POMDP information page
Week 11 (11/9,11): Abstraction: Options and Hierarchy
Slides from 11/9:
pdf
.
Slides from 11/11:
pdf
.
Alex's
discussion slides
.
Jon's
discussion slides
.
Automatic Discovery of Subgoals in RL using Diverse Density
by McGovern and Barto.
Improved Automatic Discovery of Subgoals for Options in Hierarchical Reinforcement Learning
by Kretchmar et al.
The
Journal version of the MaxQ paper
A follow-up paper on liminating irrelevant variables within a subtask:
State Abstraction in MAXQ Hierarchical Reinforcement Learning
Tom Dietterich's
tutorial on abstraction
.
Week 12 (11/16,18): Helicopter and Robot Control
Slides from 11/16:
pdf
.
Slides from 11/18:
pdf
.
Andrew Moore's tutorial on
VC dimension
And a paper by him on
A Nonparametric Approach to Noisy and Costly Optimization
PEGASUS: A policy search method for large MDPs and POMDPs
, Andrew Y. Ng and Michael Jordan. In Uncertainty in Artificial Intelligence, Proceedings of the Sixteenth Conference, 2000.
A section from A David Cohn paper on
locally weighted regression
.
A page on
localy weighted polynomial regression
.
A good tutorial on
memory-based learning
(including material on kernels and LWPR) by Andrew Moore.
Week 13 (11/23): Robot Soccer
Slides from 11/23:
pdf
; the
keepaway slides
; and
a few more
.
Dieter Fox's
mobile robotics
page:
project animations
;
landmark-based localization
.
Michail G. Lagoudakis'
page has a paper on LSPI as well as slides from his thesis defense about it.
The
UT Austin Villa
RoboCup team home page.
Greg Kuhlmann's follow-up on
progress in 3v2 keepaway
Matt Taylor's recent paper on
behavior transfer
in keepaway.
Week 14 (11/30,12/2): Incorporating Advice
Slides from 11/30:
pdf
.
Slides from 12/2:
pdf
.
PILLAR
Pengo
[
Back to Department Homepage
]
Page maintained by
Peter Stone
Questions? Send me
mail