Function Approximation |   |   | Partial Observability |   |   | Learning Methods |   |   | Ensembles |   |   |
Stochastic Optimisation |   |   | General RL |   |   | General ML |   |   | Multiagent Learning |   |   |
Comparison/Integration |   |   | Bandits |   |   | Applications |   |   | Robot Soccer |   |   |
Humanoids |   |   | Parameter |   |   | MDP |   |   | Empirical |   |   |
Failure Warning |   |   | Representation |   |   | General AI |   |   | Neural Networks |   |   |
All |   |   |
Learning Methods for Sequential Decision Making with Imperfect Representations
Shivaram Kalyanakrishnan, 2011
Details
Near-optimal Regret Bounds for Reinforcement Learning
Thomas Jaksch, Ronald Ortner, and Peter Auer, 2010
Details
Algorithms for Reinforcement Learning
Csaba Szepesvári, 2010
Details
Evolving Neural Networks for Strategic Decision-Making Problems
Nate Kohl and Risto Miikkulainen, 2009
Details
Reinforcement learning in the brain
Yael Niv, 2009
Details
Reinforcement Learning in Finite MDPs: PAC Analysis
Lihong Strehl, Alexander L., Li and Michael L. Littman, 2009
Details
On the role of tracking in stationary environments
Richard S. Sutton, Anna Koop, and David Silver, 2007
Details
Transfer Learning via Inter-Task Mappings for Temporal Difference Learning
Matthew E. Taylor, Peter Stone, and Yaxin Liu, 2007
Details
PAC model-free reinforcement learning
Alexander L. Strehl, Lihong Li, Eric Wiewiora, John Langford, and Michael L. Littman, 2006
Details
Why (PO)MDPs Lose for Spatial Tasks and What to Do About It
Terran Lane and William D. Smart, 2005
Details
A theoretical analysis of Model-Based Interval Estimation
Alexander L. Strehl and Michael L. Littman, 2005
Details
Temporal difference models describe higher-order learning in humans
Ben Seymour, John P. O'Doherty, Peter Dayan, Martin Koltzenburg, Anthony K. Jones, Raymond J. Dolan, Karl J. Friston, and Richard S. Frackowiak, 2004
Details
R-MAX - a general polynomial time algorithm for near-optimal reinforcement learning
Ronen I. Brafman and Moshe Tennenholtz, 2003
Details
Near-Optimal Reinforcement Learning in Polynomial Time
Michael Kearns and Satinder Singh, 2002
Details
Eligibility Traces for Off-Policy Policy Evaluation
Doina Precup, Richard S. Sutton, and Satinder P. Singh, 2000
Details
Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms
Satinder Singh, Tommi Jaakkola, Michael L. Littman, and Csaba Szepesvári, 2000
Details
Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping
Andrew Y. Ng, Daishi Harada, and Stuart J. Russell, 1999
Details
Learning to Drive a Bicycle Using Reinforcement Learning and Shaping
Jette Randløv and Preben Alstrøm, 1998
Details
Reinforcement Learning: An Introduction
Richard S. Sutton and Andrew G. Barto, 1998
Details
No free lunch theorems for optimization
David H. Wolpert and William G. Macready, 1997
Details
Neuro-Dynamic Programming
Dimitri P. Bertsekas and John N. Tsitsiklis, 1996
Details
On the Complexity of Solving Markov Decision Problems
Michael L. Littman, Thomas L. Dean, and Leslie Pack Kaelbling, 1995
Details
Markov Decision Processes
Martin L. Puterman, 1994
Details
On-line Q-learning using connectionist systems
G. A. Rummery and M. Niranjan, 1994
Details
On bias and step size in temporal-difference learning
Richard S. Sutton and Satinder P. Singh, 1994
Details
Practical Issues in Temporal Difference Learning
Gerald Tesauro, 1992
Details
Q-Learning
Christopher J. C. H. Watkins and Peter Dayan, 1992
Details
Learning to Predict By the Methods of Temporal Differences
Richard S. Sutton, 1988
Details
Dynamic Programming
Richard Bellman, 1957
Details
Some aspects of the sequential design of experiments
Herbert Robbins, 1952
Details