Function Approximation |   |   | Partial Observability |   |   | Learning Methods |   |   | Ensembles |   |   |
Stochastic Optimisation |   |   | General RL |   |   | General ML |   |   | Multiagent Learning |   |   |
Comparison/Integration |   |   | Bandits |   |   | Applications |   |   | Robot Soccer |   |   |
Humanoids |   |   | Parameter |   |   | MDP |   |   | Empirical |   |   |
Failure Warning |   |   | Representation |   |   | General AI |   |   | Neural Networks |   |   |
All |   |   |
Almost Optimal Exploration in Multi-Armed Bandits
Zohar Karnin, Tomer Koren, and Oren Somekh, 2013
Details
Information Complexity in Bandit Subset Selection
Emilie Kaufmann and Shivaram Kalyanakrishnan, 2013
Details
Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence
Victor Gabillon, Mohammad Ghavamzadeh, and Alessandro Lazaric, 2012
Details
Planning in Reward-Rich Domains via PAC Bandits
Sergiu Goschin, Ari Weinstein, Michael L. Littman, and Erick Chastain, 2012
Details
Learning Methods for Sequential Decision Making with Imperfect Representations
Shivaram Kalyanakrishnan, 2011
Details
Learning to Predict Humanoid Fall
Shivaram Kalyanakrishnan and Ambarish Goswami, 2011
Details
Characterizing reinforcement learning methods through parameterized learning problems
Shivaram Kalyanakrishnan and Peter Stone, 2011
Details
On Learning with Imperfect Representations
Shivaram Kalyanakrishnan and Peter Stone, 2011
Details
On Optimizing Interdependent Skills: A Case Study in Simulated 3D Humanoid Robot Soccer
Daniel Urieli, Patrick MacAlpine, Shivaram Kalyanakrishnan, Yinon Bentor, and Peter Stone, 2011
Details
Protecting Against Evaluation Overfitting in Empirical Reinforcement Learning
Shimon Whiteson, Brian Tanner, Matthew E. Taylor, and Peter Stone, 2011
Details
Exploiting Best-Match Equations for Efficient Reinforcement Learning
Harm van Seijen, Shimon Whiteson, Hado van Hasselt, and Marco Wiering, 2011
Details
Insights in Reinforcement Learning: formal analysis and empirical evaluation of temporal-difference learning algorithms
Hado Philip van Hasselt, 2011
Details
Success, strategy and skill: an experimental study
Christopher Archibald, Alon Altman, and Yoav Shoham, 2010
Details
Best Arm Identification in Multi-Armed Bandits
Jean-Yves Audibert, Sébastien Bubeck, and Rémi Munos, 2010
Details
UCB REVISITED: IMPROVED REGRET BOUNDS FOR THE STOCHASTIC MULTI-ARMED BANDIT PROBLEM
Peter Auer and Ronald Ortner, 2010
Details
Temporal Difference Bayesian Model Averaging: A Bayesian Perspective on Adapting Lambda
Carlton Downey and Scott Sanner, 2010
Details
A Brief Survey of Parametric Value Function Approximation
Matthieu Geist and Olivier Pietquin, 2010
Details
Simulation optimization using the cross-entropy method with optimal computing budget allocation
Donghai He, Loo Hay Lee, Chun-Hung Chen, Michael C. Fu, and Segev Wasserkrug, 2010
Details
An Asymptotically Optimal Bandit Algorithm for Bounded Support Models
Junya Honda and Akimichi Takemura, 2010
Details
Near-optimal Regret Bounds for Reinforcement Learning
Thomas Jaksch, Ronald Ortner, and Peter Auer, 2010
Details
Non-Stochastic Bandit Slate Problems
Satyen Kale, Lev Reyzin, and Robert E. Schapire, 2010
Details
Predicting Falls of a Humanoid Robot through Machine Learning
Shivaram Kalyanakrishnan and Ambarish Goswami, 2010
Details
Three Humanoid Soccer Platforms: Comparison and Synthesis
Shivaram Kalyanakrishnan, Todd Hester, Michael Quinlan, Yinon Bentor, and Peter Stone, 2010
Details
Efficient Selection of Multiple Bandit Arms: Theory and Practice
Shivaram Kalyanakrishnan and Peter Stone, 2010
Details
Learning Complementary Multiagent Behaviors: A Case Study
Shivaram Kalyanakrishnan and Peter Stone, 2010
Details
Fall Detection of Two-legged Walking Robots using Multi-way Principal Components Analysis
J. G. Daniël Karssen and Martijn Wisse, 2010
Details
Regret bounds for sleeping experts and bandits
Robert Kleinberg, Alexandru Niculescu-Mizil, and Yogeshwer Sharma, 2010
Details
Finite-Sample Analysis of LSTD
Alessandro Lazaric, Mohammad Ghavamzadeh, and Rémi Munos, 2010
Details
A contextual-bandit approach to personalized news article recommendation
Lihong Li, Wei Chu, John Langford, and Robert E. Schapire, 2010
Details
Estimating Learning Rates in Evolution and TDL: Results on a Simple Grid-World Problem
Simon M. Lucas, 2010
Details
Toward Off-Policy Learning Control with Function Approximation
Hamid Reza Maei, Csaba Szepesvári, Shalabh Bhatnagar, and Richard S. Sutton, 2010
Details
Biped Walk Learning Through Playback and Corrective Demonstration
\cCetin Meri\ccli and Manuela Veloso, 2010
Details
Generalized Direction Changing Fall Control of Humanoid Robots Among Multiple Objects
Umashankar Nagarajan and Ambarish Goswami, 2010
Details
Relative Entropy Policy Search
Jan Peters, Katharina Mülling, and Yasemin Altün, 2010
Details
Feature Selection Using Regularization in Approximate Linear Programs for Markov Decision Processes
Marek Petrik, Gavin Taylor, Ron Parr, and Shlomo Zilberstein, 2010
Details
Biped Walking using Coronal and Sagittal Movements based on Truncated Fourier Series
Nima Shafii, Luis Paulo Reis, and Nuno Lao, 2010
Details
Application of Machine Learning To Epileptic Seizure Detection
Ali Shoeb and John Guttag, 2010
Details
Algorithms for Reinforcement Learning
Csaba Szepesvári, 2010
Details
SZ-Tetris as a Benchmark for Studying Key Problems of Reinforcement Learning
István Szita and Csaba Szepesvári, 2010
Details
Model-based reinforcement learning with nearly tight exploration complexity bounds
István Szita and Csaba Szepesvári, 2010
Details
Reinforcement learning of motor skills in high dimensions: A path integral approach
Evangelos Theodorou, Jonas Buchli, and Stefan Schaal, 2010
Details
Improvements on Learning Tetris with Cross-Entropy
Christophe Thierry and Bruno Scherrer, 2010
Details
Building Controllers for Tetris
Christophe Thierry and Bruno Scherrer, 2010
Details
$epsilon$-First Policies for Budget-Limited Multi-Armed Bandits
Long Tran-Thanh, Archie Chapman, Enrique Munoz de Cote, Alex Rogers, and Nicholas R. Jennings, 2010
Details
Machine Learning Approach to the Detection of Fetal Hypoxia during Labor and Delivery
Philip A. Warrick, Emily F. Hamilton, Robert E. Kearney, and Doina Precup, 2010
Details
Critical Factors in the Empirical Performance of Temporal Difference and Evolutionary Methods for Reinforcement Learning
Shimon Whiteson, Matthew E. Taylor, and Peter Stone, 2010
Details
Fall Detection and Management in Biped Humanoid Robots
Javier Ruiz-del-Solar, Javier Moya, and Isao Parra-Tsunekawa, 2010
Details
Modeling billiards games
Christopher Archibald and Yoav Shoham, 2009
Details
Exploration-exploitation tradeoff using variance estimates in multi-armed bandits
Jean-Yves Audibert, Rémi Munos, and Csaba Szepesvári, 2009
Details
On the Evolution of Artificial Tetris Players
Amine Boumaza, 2009
Details
Pure Exploration in Multi-armed Bandits Problems
Sébastien Bubeck, Rémi Munos, and Gilles Stoltz, 2009
Details
Combinatorial Bandits
Nicolò Cesa-Bianchi and Gábor Lugosi, 2009
Details
The adaptive $k$-meteorologists problem and its application to structure learning and feature selection in reinforcement learning
Carlos Diuk, Lihong Li, and Bethany R. Leffler, 2009
Details
Reinforcement Learning Versus Model Predictive Control: A Comparison on a Power System Problem
Damien Ernst, Mevludin Glavic, Florin Capitanescu, and Louis Wehenkel, 2009
Details
The Knowledge-Gradient Policy for Correlated Normal Beliefs
Peter Frazier, Warren Powell, and Savas Dayanik, 2009
Details
A Case Study on Improving Defense Behavior in Soccer Simulation 2D: The NeuroHassle Approach
Thomas Gabel, Martin Riedmiller, and Florian Trost, 2009
Details
Computational Sustainability: Computational Methods for a Sustainable Environment, Economy, and Society
Carla P. Gomes, 2009
Details
Improving Optimistic Exploration in Model-Free Reinforcement Learning
Marek Grze\'s and Daniel Kudenko, 2009
Details
The WEKA Data Mining Software: An Update
Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H. Witten, 2009
Details
The CMA Evolution Strategy: A Tutorial
Nikolaus Hansen, 2009
Details
A Method for Handling Uncertainty in Evolutionary Optimization With an Application to Feedback Control of Combustion
Nikolaus Hansen, André S.P. Niederberger, Lino Guzzella, and Petros Koumoutsakos, 2009
Details
Hoeffding and Bernstein races for selecting policies in evolutionary direct policy search
Verena Heidrich-Meisner and Christian Igel, 2009
Details
Neuroevolution strategies for episodic reinforcement learning
Verena Heidrich-Meisner and Christian Igel, 2009
Details
Probabilistic Balance Monitoring for Bipedal Robots
O. Höhn and W. Gerth, 2009
Details
SarsaLandmark: an algorithm for learning in POMDPs with landmarks
Michael R. James and Satinder Singh, 2009
Details
Generalized AMOC Curves For Evaluation and Improvement of Event Surveillance
Xia Jiang, Gregory F. Cooper, and Daniel B. Neill, 2009
Details
Feature Selection for Value Function Approximation Using Bayesian Model Selection
Tobias Jung and Peter Stone, 2009
Details
An empirical analysis of value function-based and policy search reinforcement learning
Shivaram Kalyanakrishnan and Peter Stone, 2009
Details
The UT Austin Villa 3D Simulation Soccer Team 2008
Shivaram Kalyanakrishnan, Yinon Bentor, and Peter Stone, 2009
Details
Fall detection in walking robots by multi-way principal component analysis
J. G. Daniël Karssen and Martijn Wisse, 2009
Details
Learning motor primitives for robotics
Jens Kober and Jan Peters, 2009
Details
Evolving Neural Networks for Strategic Decision-Making Problems
Nate Kohl and Risto Miikkulainen, 2009
Details
Regularization and feature selection in least-squares temporal difference learning
J. Zico Kolter and Andrew Y. Ng, 2009
Details
Automatic Parameter Optimization for a Dynamic Robot Simulation
Tim Laue and Matthias Hebbel, 2009
Details
Learning Representation and Control in Markov Decision Processes: New Frontiers
Sridhar Mahadevan, 2009
Details
Nonparametric representation of an approximated Poincaré map for learning biped locomotion
Jun Morimoto and Christopher G. Atkeson, 2009
Details
Reinforcement learning in the brain
Yael Niv, 2009
Details
Biasing Approximate Dynamic Programming with a Lower Discount Factor
Marek Petrik and Bruno Scherrer, 2009
Details
Feature Discovery in Approximate Dynamic Programming
Philippe Preux, Sertan Girgin, and Manuel Loth, 2009
Details
Reinforcement learning for robot soccer
Martin Riedmiller, Thomas Gabel, Roland Hafner, and Sascha Lange, 2009
Details
Evolving Multi-modal Behavior in NPCs
Jacob Schrum and Risto Miikkulainen, 2009
Details
Reinforcement Learning in Finite MDPs: PAC Analysis
Lihong Strehl, Alexander L., Li and Michael L. Littman, 2009
Details
Stochastic search using the natural gradient
Yi Sun, Daan Wierstra, Tom Schaul, and Jürgen Schmidhuber, 2009
Details
Fast gradient-descent methods for temporal-difference learning with linear function approximation
Richard S. Sutton, Hamid Reza Maei, Doina Precup, Shalabh Bhatnagar, David Silver, Csaba Szepesvári, and Eric Wiewiora, 2009
Details
Efficient covariance matrix update for variable metric evolution strategies
Thorsten Suttorp, Nikolaus Hansen, and Christian Igel, 2009
Details
Learning to Use Working Memory in Partially Observable Environments through Dopaminergic Reinforcement
Michael T. Todd, Yael Niv, and Jonathan D. Cohen, 2009
Details
Ontogenetic and Phylogenetic Reinforcement Learning
Julian Togelius, Tom Schaul, Daan Wierstra, Christian Igel, Faustino Gomez, and Jürgen Schmidhuber, 2009
Details
Generalized Domains for Empirical Evaluations in Reinforcement Learning
Shimon Whiteson, Brian Tanner, Matthew E. Taylor, and Peter Stone, 2009
Details
Designing falling motions for a humanoid soccer goalie
Tobias Wilken, Marcell Missura, and Sven Behnke, 2009
Details
Safe Fall: Humanoid robot fall direction change through intelligent stepping and inertia shaping
Seung-kook Yun, Ambarish Goswami, and Yoshiaki Sakagami, 2009
Details
CMDragons 2009 Extended Team Description
Stefan Zickler, James Bruce, Joydeep Biswas, Michael Licitra, and Manuela Veloso, 2009
Details
A Theoretical and Empirical Analysis of Expected Sarsa
Harm van Seijen, Hado van Hasselt, Shimon Whiteson, and Marco Wiering, 2009
Details
Learning to fall: Designing low damage fall sequences for humanoid soccer robots
J. Ruiz-del-Solar, R. Palma-Amestoy, R. Marchant, I. Parra-Tsunekawa, and P. Zegers, 2009
Details
Incremental Natural Actor-Critic Algorithms
Shalabh Bhatnagar, Richard S. Sutton, Mohammad Ghavamzadeh, and Mark Lee, 2008
Details
A Comprehensive Survey of Multiagent Reinforcement Learning
Lucian Bu\csoniu, Robert Babu\vska, and Bart De Schutter, 2008
Details
An empirical evaluation of supervised learning in high dimensions
Rich Caruana, Nikolaos Karampatziakis, and Ainur Yessenalina, 2008
Details
Efficient Simulation Budget Allocation for Selecting an Optimal Subset
Chun-Hung Chen, Donghai He, Michael Fu, and Loo Hay Lee, 2008
Details
The Role of Value Systems in Decision Making
Peter Dayan, 2008
Details
Decision Theory, Reinforcement Learning, and the Brain
Peter Dayan and Nathaniel D. Daw, 2008
Details
Learning CPG-based Biped Locomotion with a Policy Gradient Method: Application to a Humanoid Robot
Gen Endo, Jun Morimoto, Takamitsu Matsubara, Jun Nakanishi, and Gordon Cheng, 2008
Details
Simulation-Based Approach to General Game Playing
Hilmar Finnsson and Yngvi Björnsson, 2008
Details
Feature Discovery in Reinforcement Learning Using Genetic Programming
Sertan Girgin and Philippe Preux, 2008
Details
Accelerated Neural Evolution through Cooperatively Coevolved Synapses
Faustino Gomez, Jürgen Schmidhuber, and Risto Miikkulainen, 2008
Details
Adaptive Treatment of Epilepsy via Batch-mode Reinforcement Learning
Arthur Guez, Robert D. Vincent, Massimo Avoli, and Joelle Pineau, 2008
Details
Similarities and differences between policy gradient methods and evolution strategies
Verena Heidrich-Meisner and Christian Igel, 2008
Details
Evolution Strategies for Direct Policy Search
Verena Heidrich-Meisner and Christian Igel, 2008
Details
Variable Metric Reinforcement Learning Methods Applied to the Noisy Mountain Car Problem
Verena Heidrich-Meisner and Christian Igel, 2008
Details
Temporal Difference Updating without a Learning Rate
Marcus Hutter and Shane Legg, 2008
Details
A new perspective to the keepaway soccer: the takers
Atil Iscen and Umut Erogul, 2008
Details
Model-Based Reinforcement Learning in a Complex Domain
Shivaram Kalyanakrishnan, Peter Stone, and Yaxin Liu, 2008
Details
Cross-Entropy Method for Reinforcement Learning
Steijn Kistemaker, 2008
Details
Multi-armed bandits in metric spaces
Robert Kleinberg, Aleksandrs Slivkins, and Eli Upfal, 2008
Details
Genetic Programming: An Introduction and Tutorial, with a Survey of Techniques and Applications
William B. Langdon, Riccardo Poli, Nicholas Freitag McPhee, and John R. Koza, 2008
Details
A worst-case comparison between temporal difference and residual gradient with linear function approximation
Lihong Li, 2008
Details
An analysis of reinforcement learning with function approximation
Francisco S. Melo, Sean P. Meyn, and M. Isabel Ribeiro, 2008
Details
Analysis of an Evolutionary Reinforcement Learning Method in a Multiagent Domain
Jan Hendrik Metzen, Mark Edgington, Yohannes Kassahun, and Frank Kirchner, 2008
Details
Empirical Bernstein stopping
Volodymyr Mnih, Csaba Szepesvári, and Jean-Yves Audibert, 2008
Details
Real-time selection and generation of fall damage reduction actions for humanoid robots
Kunihiro Ogata, Koji Terada, and Yasuo Kuniyoshi, 2008
Details
Advanced Data Mining Techniques
David L. Olson and Dursun Delen, 2008
Details
An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning
Ronald Parr, Lihong Li, Gavin Taylor, Christopher Painter-Wakefield, and Michael L. Littman, 2008
Details
Reinforcement learning of motor skills with policy gradients
Jan Peters and Stefan Schaal, 2008
Details
Natural Actor-Critic
Jan Peters and Stefan Schaal, 2008
Details
Sample-based Learning and Search with Permanent and Transient Memories
David Silver, Richard S. Sutton, and Martin Müller, 2008
Details
Dyna-Style Planning with Linear Function Approximation and Prioritized Sweeping
Richard S. Sutton, Csaba Szepesvári, Alborz Geramifard, and Michael Bowling, 2008
Details
The many faces of optimism: a unifying approach
Istvan Szita and András Lörincz, 2008
Details
Managing Power Consumption and Performance of Computing Systems Using Reinforcement Learning
Gerald Tesauro, Rajarshi Das, Hoi Chan, Jeffrey O. Kephart, Charles Lefurgy, David W. Levine, and Freeman Rawson, 2008
Details
Viability and predictive control for safe locomotion
Pierre-Brice Wieber, 2008
Details
Ensemble Algorithms in Reinforcement Learning
Marco Wiering and Hado van Hasselt, 2008
Details
SATzilla: Portfolio-based Algorithm Selection for SAT
Lin Xu, Frank Hutter, Holger H. Hoos, and Kevin Leyton-Brown, 2008
Details
Self-Optimizing Memory Controllers: A Reinforcement Learning Approach
Engin \.Ipek, Onur Mutlu, José and Martínez, and Rich Caruana, 2008
Details
Tuning Bandit Algorithms in Stochastic Environments
Jean-Yves Audibert, Rémi Munos, and Csaba Szepesvári, 2007
Details
Sample Complexity of Policy Search with Known Dynamics
Peter L. Bartlett and Ambuj Tewari, 2007
Details
Distinguishing falls from normal ADL using vertical velocity profiles
Alan K. Bourke, Karol J. O'Donovan, and Gearóid M. ÓLaighin, 2007
Details
An optimal planning of falling motions of a humanoid robot
Kiyoshi Fujiwara, Shuuji Kajita, Kensuke Harada, Kenji Kaneko, Mitsuharu Morisawa, Fumio Kanehiro, Shinichiro Nakaoka, and Hirohisa Hirukawa, 2007
Details
Bayesian actor-critic algorithms
Mohammad Ghavamzadeh and Yaakov Engel, 2007
Details
Bayesian Policy Gradient Algorithms
Mohammad Ghavamzadeh and Yaakov Engel, 2007
Details
Human-Robot Interaction: A Survey
Michael A. Goodrich and Alan C. Schultz, 2007
Details
Approximation Algorithms for Budgeted Learning Problems
Sudipto Guha and Kamesh Munagala, 2007
Details
Learning RoboCup-Keepaway with Kernels
Tobias Jung and Daniel Polani, 2007
Details
Batch Reinforcement Learning in a Complex Domain
Shivaram Kalyanakrishnan and Peter Stone, 2007
Details
Half Field Offense in RoboCup Soccer: A Multiagent Reinforcement Learning Case Study
Shivaram Kalyanakrishnan, Yaxin Liu, and Peter Stone, 2007
Details
The UT Austin Villa 3D Simulation Soccer Team 2007
Shivaram Kalyanakrishnan and Peter Stone, 2007
Details
Recent advances in ranking and selection
Seong-Hee Kim and Barry L. Nelson, 2007
Details
Large Scale Reinforcement Learning using Q-Sarsa($łambda$) and Cascading Neural Networks
Steffen Nissen, 2007
Details
Fall detection - Principles and Methods
N. Noury, A. Fleury, P. Rumeau, A. K. Bourke, G. ÓLaighin, V. Rialle, and J.E. Lundy, 2007
Details
Falling Motion Control for Humanoid Robots While Walking
Kunihiro Ogata, Koji Terada, and Yasuo Kuniyoshi, 2007
Details
Efficient Failure Detection on Mobile Robots Using Particle Filters with Gaussian Process Proposals
Christian Plagemann, Dieter Fox, and Wolfram Burgard, 2007
Details
On Experiences in a Complex and Competitive Gaming Domain: Reinforcement Learning Meets RoboCup
Martin Riedmiller and Thomas Gabel, 2007
Details
Evaluation of Policy Gradient Methods and Variants on the Cart-Pole Benchmark
Martin Riedmiller, Jan Peters, and Stefan Schaal, 2007
Details
Autonomous blimp control using model-free reinforcement learning in a continuous state and action space
Axel Rottmann, Christian Plagemann, Peter Hilgers, and Wolfram Burgard, 2007
Details
Learning classifier systems: a survey
Olivier Sigaud and Stewart W. Wilson, 2007
Details
Reinforcement Learning of Local Shape in the Game of Go
David Silver, Richard S. Sutton, and Martin Müller, 2007
Details
On the role of tracking in stationary environments
Richard S. Sutton, Anna Koop, and David Silver, 2007
Details
Learning to Play Using Low-Complexity Rule-Based Policies: Illustrations through Ms. Pac-Man
István Szita and András L\Horincz, 2007
Details
Representation Transfer for Reinforcement Learning
Matthew E. Taylor and Peter Stone, 2007
Details
Transfer Learning via Inter-Task Mappings for Temporal Difference Learning
Matthew E. Taylor, Peter Stone, and Yaxin Liu, 2007
Details
On the use of hybrid reinforcement learning for autonomic resource allocation
Gerald Tesauro, Nicholas K. Jong, Rajarshi Das, and Mohamed N. Bennani, 2007
Details
Adaptive Representations for Reinforcement Learning
Shimon Azariah Whiteson, 2007
Details
Piecewise-Linear Pattern Generator and Reflex System for Humanoid Robots
Riadh Zaier and Shinji Kanda, 2007
Details
See, walk, and kick: Humanoid robots start to play soccer
Sven Behnke, Michael Schreiber, Jörg Stückler, Reimund Renner, and Hauke Strasdat, 2006
Details
Pattern Recognition and Machine Learning
Christopher M. Bishop, 2006
Details
An empirical comparison of supervised learning algorithms
Rich Caruana and Alexandru Niculescu-Mizil, 2006
Details
Learning the structure of Factored Markov Decision Processes in reinforcement learning problems
Thomas Degris, Olivier Sigaud, and Pierre-Henri Wuillemin, 2006
Details
Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems
Eyal Even-Dar, Shie Mannor, and Yishay Mansour, 2006
Details
Tetris: A Study of Randomized Constraint Sampling
Vivek F. Farias and Benjamin Van Roy, 2006
Details
Towards an Optimal Falling Motion for a Humanoid Robot
Kiyoshi Fujiwara, Shuuji Kajita, Kensuke Harada, Kenji Kaneko, Mitsuharu Morisawa, Fumio Kanehiro, Shinichiro Nakaoka, and Hirohisa Hirukawa, 2006
Details
Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming
Abraham P. George and Warren B. Powell, 2006
Details
Hierarchical multi-agent reinforcement learning
Mohammad Ghavamzadeh, Sridhar Mahadevan, and Rajbala Makar, 2006
Details
Reinforcement learning for quasi-passive dynamic walking of an unstable biped robot
Kentarou Hitomi, Tomohiro Shibata, Yutaka Nakamura, and Shin Ishii, 2006
Details
An Overview of Cooperative and Competitive Multiagent Learning
Pieter Jan't Hoen, Karl Tuyls, Liviu Panait, Sean Luke, and Johannes A. La Poutré, 2006
Details
Looping suffix tree-based inference of partially observable hidden state
Michael P. Holmes and Charles Lee Isbell, Jr, 2006
Details
Bandit Based Monte-Carlo Planning
Levente Kocsis and Csaba Szepesvári, 2006
Details
Evolving a Real-World Vehicle Warning System
Nate Kohl, Kenneth Stanley, Risto Miikkulainen, Michael Samples, and Rini Sherony, 2006
Details
Stepping Motion for a Human-like Character to Maintain Balance against Large Perturbations
Shunsuke Kudoh, Taku Komura, and Katsushi Ikeuchi, 2006
Details
Quadruped Robot Obstacle Negotiation via Reinforcement Learning
Honglak Lee, Yirong Shen, Chih-Han Yu, Gurjeet Singh, and Andrew Y. Ng, 2006
Details
Relaxed fault detection and isolation: An application to a nonlinear case study
Raffaella Mattone and Alessandro De Luca, 2006
Details
Reinforcement learning for optimized trade execution
Yuriy Nevmyvaka, Yi Feng, and Michael Kearns, 2006
Details
Balance Control of a Humanoid Robot Based on the Reaction Null Space Method
Akinori Nishio, Kentaro Takahashi, and Dragomir N. Nenchev, 2006
Details
Anytime Point-Based Approximations for Large POMDPs
Joelle Pineau, Geoffrey J. Gordon, and Sebastian Thrun, 2006
Details
Capture Point: A Step toward Humanoid Push Recovery
Jerry Pratt, John Carff, Sergey Drakunov, and Ambarish Goswami, 2006
Details
Instability Detection and Fall Avoidance for a Humanoid using Attitude Sensors and Reflexes
Reimund Renner and Sven Behnke, 2006
Details
Integrating Techniques from Statistical Ranking into Evolutionary Algorithms
Christian Schmidt, Jürgen Branke, and Stephen E. Chick, 2006
Details
Keepaway Soccer: From Machine Learning Testbed to Benchmark
Peter Stone, Gregory Kuhlmann, Matthew E. Taylor, and Yaxin Liu, 2006
Details
PAC model-free reinforcement learning
Alexander L. Strehl, Lihong Li, Eric Wiewiora, John Langford, and Michael L. Littman, 2006
Details
Getting Back on Two Feet: Reliable Standing-up Routines for a Humanoid Robot
Jörg Stückler, Johannes Schwenk, and Sven Behnke, 2006
Details
Learning Tetris using the noisy cross-entropy method
István Szita and András L\Horincz, 2006
Details
Evolutionary Function Approximation for Reinforcement Learning
Shimon Whiteson and Peter Stone, 2006
Details
On-line evolutionary computation for reinforcement learning in stochastic domains
Shimon Whiteson and Peter Stone, 2006
Details
An Evolutionary Approach to Tetris
Niko Böhm, Gabriella Kókai, and Stefan Mandl, 2005
Details
An Adaptive Sampling Algorithm for Solving Markov Decision Processes
Hyeong Soo Chang, Michael C. Fu, Jiaqiao Hu, and Steven I. Marcus, 2005
Details
Tree-Based Batch Mode Reinforcement Learning
Damien Ernst, Pierre Geurts, and Louis Wehenkel, 2005
Details
Sensory reflex control for humanoid walking
Qiang Huang and Yoshihiko Nakamura, 2005
Details
Why (PO)MDPs Lose for Spatial Tasks and What to Do About It
Terran Lane and William D. Smart, 2005
Details
Evolving a Neural Network Location Evaluator to Play Ms. Pac-Man
Simon M. Lucas, 2005
Details
Basis Function Adaptation in Temporal Difference Reinforcement Learning
Ishai Menache, Shie Mannor, and Nahum Shimkin, 2005
Details
Spark - A generic simulator for physical multi-agent simulations
Oliver Obst and Markus Rollman, 2005
Details
Cooperative Multi-Agent Learning: The State of the Art
Liviu Panait and Sean Luke, 2005
Details
Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method
Martin Riedmiller, 2005
Details
Function Approximation via Tile Coding: Automating Parameter Choice
Alexander A. Sherstov and Peter Stone, 2005
Details
Reinforcement Learning for RoboCup-Soccer Keepaway
Peter Stone, Richard S. Sutton, and Gregory Kuhlmann, 2005
Details
A theoretical analysis of Model-Based Interval Estimation
Alexander L. Strehl and Michael L. Littman, 2005
Details
Zero-Moment Point - Thirty Five Years of its Life
Miomir Vukobratović and Branislav Borovac, 2005
Details
Evolving Soccer Keepaway Players Through Task Decomposition
Shimon Whiteson, Nate Kohl, Risto Miikkulainen, and Peter Stone, 2005
Details
Data Mining: Practical machine learning tools and techniques
Ian H. Witten and Eibe Frank, 2005
Details
A Tutorial on the Cross-Entropy Method
Pieter-Tjerk de Boer, Dirk P. Kroese, Shie Mannor, and Reuven Y. Rubinstein, 2005
Details
Sequential Sampling in Noisy Environments
Jürgen Branke and Christian Schmidt, 2004
Details
Tetris is hard, even to approximate
Ron Breukelaar, Erik D. Demaine, Susan Hohenberger, Hendrik Jan Hoogeboom, Walter A. Kosters, and David Liben-Nowell, 2004
Details
Failure diagnosis using decision trees
Mike Chen, Alice X. Zheng, Jim Lloyd, Michael I. Jordan, and Eric Brewer, 2004
Details
Decentralized Control of Cooperative Systems: Categorization and Complexity Analysis
Claudia V. Goldman and Shlomo Zilberstein, 2004
Details
Machine Learning for Fast Quadrupedal Locomotion
Nate Kohl and Peter Stone, 2004
Details
Sparse cooperative Q-learning
Jelle R. Kok and Nikos Vlassis, 2004
Details
Reinforcement learning for sensing strategies
Cody Kwok and Dieter Fox, 2004
Details
Distinctive Image Features from Scale-Invariant Keypoints
David G. Lowe, 2004
Details
Active Model Selection
Omid Madani, Daniel J. Lizotte, and Russell Greiner, 2004
Details
The Sample Complexity of Exploration in the Multi-Armed Bandit Problem
Shie Mannor and John N. Tsitsiklis, 2004
Details
Convergence of synchronous reinforcement learning with linear function approximation
Artur Merke and Ralf Schoknecht, 2004
Details
Webots$^TM$: Professional Mobile Robot Simulation
Olivier Michel, 2004
Details
Autonomous Helicopter Flight via Reinforcement Learning
Andrew Y. Ng, H. Jin Kim, Michael I. Jordan, and Shankar Sastry, 2004
Details
On the Numeric Stability of Gaussian Processes Regression for Relational Reinforcement Learning
Jan Ramon and Kurt Driessens, 2004
Details
Sparse Distributed Memories for On-Line Value-Based Reinforcement Learning
Bohdana Ratitch and Doina Precup, 2004
Details
Multi-Agent Patrolling with Reinforcement Learning
Hugo Santana, Geber Ramalho, Vincent Corruble, and Bohdana Ratitch, 2004
Details
Temporal difference models describe higher-order learning in humans
Ben Seymour, John P. O'Doherty, Peter Dayan, Martin Koltzenburg, Anthony K. Jones, Raymond J. Dolan, Karl J. Friston, and Richard S. Frackowiak, 2004
Details
Efficient Evolution of Neural Networks Through Complexification
Kenneth Owen Stanley, 2004
Details
Stochastic policy gradient reinforcement learning on a simple 3D biped
Russ Tedrake, Teresa Weirui Zhang, and H. Sebastian Seung, 2004
Details
GenSo-EWS: a novel neural-fuzzy based early warning system for predicting bank failures
W. L. Tung, C. Quek, and P. Cheng, 2004
Details
Adaptive Job Routing and Scheduling
Shimon Whiteson and Peter Stone, 2004
Details
A Robot that Reinforcement-Learns to Identify and Memorize Important Previous Observations
Bram Bakker, Viktor Zhumatiy, Gabriel Gruener, and Jürgen Schmidhuber, 2003
Details
Using Ranking and Selection to Clean Up after Simulation Optimization
Justin Boesel, Barry L. Nelson, and Seong-Hee Kim, 2003
Details
R-MAX - a general polynomial time algorithm for near-optimal reinforcement learning
Ronen I. Brafman and Moshe Tennenholtz, 2003
Details
Users Manual: RoboCup Soccer Server --- for Soccer Server Version 7.07 and Later
Mao Chen, Klaus Dorer, Ehsan Foroughi, Fredrick Heintz, ZhanXiang Huang, Spiros Kapetanakis, Kostas Kostiadis, Johan Kummeneje, Jan Murray, Itsuki Noda, Oliver Obst, Pat Riley, Timo Steffens, Yi Wang, and Xiang Yin, 2003
Details
SPEEDY: A Fall Detector in a Wrist Watch
Thomas Degen, Heinz Jaeckel, Michael Rufer, and Stefan Wyss, 2003
Details
Learning to play Pac-Man: An Evolutionary, Rule-based Approach
Marcus Gallagher and Amanda Ryan, 2003
Details
Active Guidance for a Finless Rocket Using Neuroevolution
Faustino J. Gomez and Risto Miikkulainen, 2003
Details
Biped walking pattern generation by a simple three-dimensional inverted pendulum model
Shuuji Kajita, Fumio Kanehiro, Kenji Kaneko, Kiyoshi Fujiwara, Kazuhito Yokoi, and Hirohisa Hirukawa, 2003
Details
Survey of Intelligent Control Techniques for Humanoid Robots
Du\vsko Katić and Miomir Vukobratović, 2003
Details
On Actor-Critic Algorithms
Vijay R. Konda and John N. Tsitsiklis, 2003
Details
Least-Squares Policy Iteration
Michail G. Lagoudakis and Ronald Parr, 2003
Details
Reinforcement Learning as Classification: Leveraging Modern Classifiers
Michail G. Lagoudakis and Ronald Parr, 2003
Details
Boosting as a Metaphor for Algorithm Design
Kevin Leyton-Brown, Eugene Nudelman, Galen Andrew, Jim McFadden, and Yoav Shoham, 2003
Details
Lower Bounds on the Sample Complexity of Exploration in the Multi-armed Bandit Problem
Shie Mannor and John N. Tsitsiklis, 2003
Details
Least Squares Policy Evaluation Algorithms with Linear Function Approximation
A. Nedić and D. P. Bertsekas, 2003
Details
A Convergent Form of Approximate Policy Iteration
Theodore J. Perkins and Doina Precup, 2003
Details
Using MDP Characteristics to Guide Exploration in Reinforcement Learning
Bohdana Ratitch and Doina Precup, 2003
Details
Optimality of Reinforcement Learning Algorithms with Linear Function Approximation
Ralf Schoknecht, 2003
Details
An Agent that Learns to Play Pacman
Donald Shepherd, 2003
Details
Introduction to Stochastic Search and Optimization
James C. Spall, 2003
Details
Monitoring and early warning for Internet worms
Cliff Changchun Zou, Lixin Gao, Weibo Gong, and Don Towsley, 2003
Details
Scaling Internal-State Policy-Gradient Methods for POMDPs
Douglas Aberdeen and Jonathan Baxter, 2002
Details
Finite-time Analysis of the Multiarmed Bandit Problem
Peter Auer, Nicolò Cesa-Bianchi, and Paul Fischer, 2002
Details
The Nonstochastic Multiarmed Bandit Problem
Peter Auer, Nicolò Cesa-Bianchi, Yoav Freund, and Robert E. Schapire, 2002
Details
Threshold selection, hypothesis tests, and DOE methods
Thomas Beielstein and Sandor Markon, 2002
Details
The Complexity of Decentralized Control of Markov Decision Processes
Daniel S. Bernstein, Robert Givan, Neil Immerman, and Shlomo Zilberstein, 2002
Details
An $epsilon$-Optimal Grid-Based Algorithm for Partially Observable Markov Decision Processes
Blai Bonet, 2002
Details
Technical Update: Least-Squares Temporal Difference Learning
Justin A. Boyan, 2002
Details
Deep Blue
Murray Campbell, A. Joseph Hoane Jr., and Feng-hsiung Hsu, 2002
Details
PAC Bounds for Multi-armed Bandit and Markov Decision Processes
Eyal Even-Dar, Shie Mannor, and Yishay Mansour, 2002
Details
Optimization for simulation: Theory vs. Practice
Michael C. Fu, 2002
Details
UKEMI: Falling motion control to minimize damage to biped humanoid robot
Kiyoshi Fujiwara, Fumio Kanehiro, Shuji Kajita, Kenji Kaneko, Kazuhito Yokoi, and Hirohisa Hirukawa, 2002
Details
Coordinated Reinforcement Learning
Carlos Guestrin, Michail G. Lagoudakis, and Ronald Parr, 2002
Details
Multiple Decision Procedures: Theory and Methodology of Selecting and Ranking Populations
Shanti S. Gupta and S. Panchapakesan, 2002
Details
Mining complex models from arbitrarily large databases in constant time
Geoff Hulten and Pedro Domingos, 2002
Details
Discriminative, Generative and Imitative learning
Tony Jebara, 2002
Details
Approximately Optimal Approximate Reinforcement Learning
Sham Kakade and John Langford, 2002
Details
Near-Optimal Reinforcement Learning in Polynomial Time
Michael Kearns and Satinder Singh, 2002
Details
A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes
Michael Kearns, Yishay Mansour, and Andrew Y. Ng, 2002
Details
Least-Squares Methods in Reinforcement Learning for Control
Michail G. Lagoudakis, Ronald Parr, and Michael L. Littman, 2002
Details
Variable Resolution Discretization in Optimal Control
Rémi Munos and Andrew Moore, 2002
Details
Balance control analysis of humanoid robot based on ZMP feedback control
Napoleon, Shigeki Nakaura, and Mitsuji Sampei, 2002
Details
Kernel-Based Reinforcement Learning
Dirk Ormoneit and Śaunak Sen, 2002
Details
On the Existence of Fixed Points for Q-Learning and Sarsa in Partially Observable Domains
Theodore J. Perkins and Mark D. Pendrith, 2002
Details
Reinforcement Learning for POMDPs Based on Action Values and Stochastic Optimization
Theodore J. Perkins, 2002
Details
Learning from Scarce Experience
Leonid Peshkin and Christian R. Shelton, 2002
Details
Characterizing Markov Decision Processes
Bohdana Ratitch and Doina Precup, 2002
Details
The intelligent ASIMO: system overview and integration
Yoshiaki Sakagami, Ryujin Watanabe, Chiaki Aoyama, Shinichi Matsunaga, Nobuo Higaki, and Kikuo Fujimura, 2002
Details
A Perspective View and Survey of Meta-Learning
Ricardo Vilalta and Youssef Drissi, 2002
Details
On the stability of walking systems
Pierre-Brice Wieber, 2002
Details
Evolution strategies in noisy environments- a survey of existing work
D. V. Arnold, 2001
Details
Scaling to Very Very Large Corpora for Natural Language Disambiguation
Michele Banko and Eric Brill, 2001
Details
Infinite-Horizon Policy-Gradient Estimation
Jonathan Baxter and Peter L. Bartlett, 2001
Details
Random Forests
Leo Breiman, 2001
Details
Batch Value Function Approximation via Support Vectors
Thomas G. Dietterich and Xin Wang, 2001
Details
Convergence of Optimistic and Incremental Q-Learning
Eyal Even-Dar and Yishay Mansour, 2001
Details
Evolutionary Search, Stochastic Policies with Memory, and Reinforcement Learning with Hidden State
Matthew R. Glickman and Katia Sycara, 2001
Details
Algorithm portfolios
Carla P. Gomes and Bart Selman, 2001
Details
Max-norm Projections for Factored MDPs
Carlos Guestrin, Daphne Koller, and Ronald Parr, 2001
Details
Multiagent Planning with Factored MDPs
Carlos Guestrin, Daphne Koller, and Ronald Parr, 2001
Details
AutoBalancer: An Online Dynamic Balance Compensation Scheme for Humanoid Robots
Satoshi Kagami, Fumio Kanehiro, Yukiharu Tamiya, Masayuki Inaba, and Hirochika Inoue, 2001
Details
A Natural Policy Gradient
Sham Kakade, 2001
Details
A fully sequential procedure for indifference-zone selection in simulation
Seong-Hee Kim and Barry L. Nelson, 2001
Details
Thresholding - a selection operator for noisy ES
Sandor Markon, Dirk V. Arnold, Thomas Bäck, Thomas Beielstein, and Hans-Georg Beyer, 2001
Details
Learning to trade via direct reinforcement
John Moody and Matthew Saffell, 2001
Details
On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes
Andrew Y. Ng and Michael I. Jordan, 2001
Details
Off-Policy Temporal Difference Learning with Function Approximation
Doina Precup, Richard S. Sutton, and Sanjoy Dasgupta, 2001
Details
On the Convergence of Temporal-Difference Learning with Linear Function Approximation
Vladislav Tadić, 2001
Details
Reinforcement Learning in POMDP's via Direct Gradient Ascent
Jonathan Baxter and Peter L. Bartlett, 2000
Details
Evolutionary algorithms in noisy environments: theoretical issues and guidelines for practice
Hans-Georg Beyer, 2000
Details
Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition
Thomas G. Dietterich, 2000
Details
Mining high-speed data streams
Pedro Domingos and Geoff Hulten, 2000
Details
Planning treatment of ischemic heart disease with partially observable Markov decision processes
Milos Hauskrecht and Hamish Fraser, 2000
Details
Value-Function Approximations for Partially Observable Markov Decision Processes
Milos Hauskrecht, 2000
Details
Local Search Algorithms for SAT: An Empirical Evaluation
Holger H. Hoos and Thomas Stützle, 2000
Details
Policy Iteration for Factored MDPs
Daphne Koller and Ronald Parr, 2000
Details
Policy Search via Density Estimation
Andrew Y. Ng, Ronald Parr, and Daphne Koller, 2000
Details
PEGASUS: A policy search method for large MDPs and POMDPs
Andrew Y. Ng and Michael Jordan, 2000
Details
Meta-Learning by Landmarking Various Learning Algorithms
Bernhard Pfahringer, Hilan Bensusan, and Christophe Giraud-Carrier, 2000
Details
Exploiting Inherent Robustness and Natural Dynamics in the Control of Bipedal Walking Robots
Jerry E. Pratt, 2000
Details
Eligibility Traces for Off-Policy Policy Evaluation
Doina Precup, Richard S. Sutton, and Satinder P. Singh, 2000
Details
Optimization of Noisy Fitness Functions by Means of Genetic Algorithms Using History of Search
Yasuhito Sano and Hajime Kita, 2000
Details
Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms
Satinder Singh, Tommi Jaakkola, Michael L. Littman, and Csaba Szepesvári, 2000
Details
Policy Gradient Methods for Reinforcement Learning with Function Approximation
Richard S. Sutton, David A. McAllester, Satinder P. Singh, and Yishay Mansour, 2000
Details
Monte Carlo POMDPs
Sebastian Thrun, 2000
Details
Gradient Descent for General Reinforcement Learning
Leemon Baird and Andrew Moore, 1999
Details
An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants
Eric Bauer and Ron Kohavi, 1999
Details
Reinforcement Learning for Control of Self-Similar Call Traffic in Broadband Networks
Jakob Carlström and Ernst Nordström, 1999
Details
Activity Monitoring: Noticing Interesting Changes in Behavior
Tom Fawcett and Foster Provost, 1999
Details
Selecting and Ordering Populations: A New Statistical Methodology
Jean Dickinson Gibbons, Ingram Olkin, and Milton Sobel, 1999
Details
Solving Non-Markovian Control Tasks with Neuro-Evolution
Faustino J. Gomez and Risto Miikkulainen, 1999
Details
An empirical evaluation of several methods to select the best system
Koichiro Inoue, Stephen E. Chick, and Chun-Hung Chen, 1999
Details
Evolutionary Algorithms for Reinforcement Learning
David E. Moriarty, Alan C. Schultz, and John J. Grefenstette, 1999
Details
Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping
Andrew Y. Ng, Daishi Harada, and Stuart J. Russell, 1999
Details
Convergence of Reinforcement Learning With General Function Approximators
Vassilis A. Papavassiliou and Stuart Russell, 1999
Details
Reinforcement Learning Using Approximate Belief States
Andrés Rodríguez, Ronald Parr, and Daphne Koller, 1999
Details
Distributed Value Functions
Jeff Schneider, Weng-Keen Wong, Andrew Moore, and Martin Riedmiller, 1999
Details
Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning
Richard S. Sutton, Doina Precup, and Satinder P. Singh, 1999
Details
On-Line New Event Detection and Tracking
James Allan, Ron Papka, and Victor Lavrenko, 1998
Details
Learning hierarchical control structures for multiple tasks and changing environments
Bruce L. Digney, 1998
Details
Robot Shaping: An Experiment in Behavior Engineering
Marco Dorigo and Marco Colombetti, 1998
Details
Neural Networks: A Comprehensive Foundation
Simon Haykin, 1998
Details
Symposium on Applications of Reinforcement Learning: Final Report for NSF Grant IIS-9810208
Pat Langley and Mark Pendrith, 1998
Details
Using Eligibility Traces to Find the Best Memoryless Policy in Partially Observable Markov Decision Processes
John Loch and Satinder Singh, 1998
Details
Q2: Memory-Based Active Learning for Optimizing Noisy Continuous Functions
Andrew W. Moore, Jeff G. Schneider, Justin A. Boyan, and Mary S. Lee, 1998
Details
Hierarchical Control and Learning for Markov Decision Processes
Ronald Edward Parr, 1998
Details
An Analysis of Direct Reinforcement Learning in Non-Markovian Domains
Mark D. Pendrith and Michael J. McGarity, 1998
Details
Learning to Drive a Bicycle Using Reinforcement Learning and Shaping
Jette Randløv and Preben Alstrøm, 1998
Details
Averaging Efficiently in the Presence of Noise
Peter Stagge, 1998
Details
Layered Learning in Multi-Agent Systems
Peter Stone, 1998
Details
Reinforcement Learning: An Introduction
Richard S. Sutton and Andrew G. Barto, 1998
Details
Learning and Value Function Approximation in Complex Decision Processes
Benjamin Van Roy, 1998
Details
A Comparison of Direct and Model-Based Reinforcement Learning
Christopher G. Atkeson and Juan Carlos Santamar\'ia, 1997
Details
How to Lose at Tetris
Heidi Burgiel, 1997
Details
Multitask Learning
Rich Caruana, 1997
Details
Machine-Learning Research: Four Current Directions
Thomas G. Dietterich, 1997
Details
The Racing Algorithm: Model Selection for Lazy Learners
Oded Maron and Andrew W. Moore, 1997
Details
Reinforcement Learning in the Multi-Robot Domain
Maja J. Matarić, 1997
Details
Alarm effectiveness in driver-centred collision-warning systems
R. Parasuraman, P. A. Hancock, and O. Olofinboba, 1997
Details
Reinforcement Learning for Dynamic Channel Allocation in Cellular Telephone Systems
Satinder Singh and Dimitri Bertsekas, 1997
Details
An analysis of temporal-difference learning with function approximation
John N. Tsitsiklis and Benjamin Van Roy, 1997
Details
No free lunch theorems for optimization
David H. Wolpert and William G. Macready, 1997
Details
Exponentially many local minima for single neurons
Peter Auer, Mark Herbster, and Manfred K. Warmuth, 1996
Details
Neuro-Dynamic Programming
Dimitri P. Bertsekas and John N. Tsitsiklis, 1996
Details
Linear Least-Squares Algorithms for Temporal Difference Learning
Steven J. Bradtke and Andrew G. Barto, 1996
Details
Improving Elevator Performance Using Reinforcement Learning
Robert H. Crites and Andrew G. Barto, 1996
Details
Experiments with a New Boosting Algorithm
Yoav Freund and Robert E. Schapire, 1996
Details
Stable Fitted Reinforcement Learning
Geoffrey J. Gordon, 1996
Details
Simulated Annealing for noisy cost functions
Walter J. Gutjahr and Georg Ch. Pflug, 1996
Details
Reinforcement Learning with Selective Perception and Hidden State
Andrew Kachites McCallum, 1996
Details
Genetic Algorithms, Selection Schemes, and the Varying Effects of Noise
Brad L. Miller and David E. Goldberg, 1996
Details
Memory-based Stochastic Optimization
Andrew W. Moore and Jeff Schneider, 1996
Details
Incremental Multi-Step Q-Learning
Jing Peng and Ronald J. Williams, 1996
Details
Bagging, Boosting, and C4.5
J. Ross Quinlan, 1996
Details
Evolution-Based Discovery of Hierarchical Behaviors
Justinian P. Rosca and Dana H. Ballard, 1996
Details
Reinforcement learning with replacing eligibility traces
Satinder P. Singh and Richard S. Sutton, 1996
Details
Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding
Richard S. Sutton, 1996
Details
Feature-based methods for large scale dynamic programming
John N. Tsitsiklis and Benjamin Van Roy, 1996
Details
Residual Algorithms: Reinforcement Learning with Function Approximation
Leemon Baird, 1995
Details
Design and analysis of experiments for statistical selection, screening, and multiple comparisons
Robert E. Bechhofer, Thomas J. Santner, and David M. Goldsman, 1995
Details
A Counterexample to Temporal Differences Learning
Dimitri P. Bertsekas, 1995
Details
Generalization in Reinforcement Learning: Safely Approximating the Value Function
Justin A. Boyan and Andrew W. Moore, 1995
Details
Recursive Automatic Bias Selection for Classifier Construction
Carla E. Brodley, 1995
Details
Stable Function Approximation in Dynamic Programming
Geoffrey J. Gordon, 1995
Details
Evaluation and Selection of Biases in Machine Learning
Diana F. Gordon and Marie desJardins, 1995
Details
Strongly Typed Genetic Programming in Evolving Cooperation Strategies
Thomas Haynes, Roger L. Wainwright, Sandip Sen, and Dale A. Schoenefeld, 1995
Details
Reinforcement Learning Algorithm for Partially Observable Markov Problems
Tommi Jaakkola, Satinder P. Singh, and Michael I. Jordan, 1995
Details
Applications of machine learning and rule induction
Pat Langley and Herbert A. Simon, 1995
Details
On the Complexity of Solving Markov Decision Problems
Michael L. Littman, Thomas L. Dean, and Leslie Pack Kaelbling, 1995
Details
Instance-Based Utile Distinctions for Reinforcement Learning with Hidden State
R. Andrew McCallum, 1995
Details
The Parti-game Algorithm for Variable Resolution Reinforcement Learning in Multidimensional State-spaces
Andrew W. Moore and Christopher G. Atkeson, 1995
Details
Approximating Optimal Policies for Partially Observable Stochastic Domains
Ronald Parr and Stuart Russell, 1995
Details
Methods for Competitive Co-Evolution: Finding Opponents Worth Beating
Christopher D. Rosin and Richard K. Belew, 1995
Details
Problem Solving with Reinforcement Learning
Gavin Adrian Rummery, 1995
Details
Sequential PAC Learning
Dale Schuurmans and Russell Greiner, 1995
Details
Artificial Intelligence: An Empirical Science
Herbert A. Simon, 1995
Details
Reinforcement Learning with Soft State Aggregation
Satinder P. Singh, Tommi Jaakkola, and Michael I. Jordan, 1995
Details
A Reinforcement Learning Approach to job-shop Scheduling
Wei Zhang and Thomas G. Dietterich, 1995
Details
Acting optimally in partially observable stochastic domains
Anthony R. Cassandra, Leslie Pack Kaelbling, and Michael L. Littman, 1994
Details
Using a Genetic Algorithm to Search for the Representational Bias of a Collective Reinforcement Learner
Helen G. Cobb and Peter Bock, 1994
Details
TD($łambda$) Converges with Probability 1
Peter Dayan and Terrence J. Sejnowski, 1994
Details
An Introduction to Computational Learning Theory
Michael J. Kearns and Umesh V. Vazirani, 1994
Details
Markov Decision Processes
Martin L. Puterman, 1994
Details
On-line Q-learning using connectionist systems
G. A. Rummery and M. Niranjan, 1994
Details
Learning Without State-Estimation in Partially Observable Markovian Decision Processes
Satinder P. Singh, Tommi Jaakkola, and Michael I. Jordan, 1994
Details
An Upper Bound on the Loss from Approximate Optimal-Value Functions
Satinder P. Singh and Richard C. Yee, 1994
Details
On bias and step size in temporal-difference learning
Richard S. Sutton and Satinder P. Singh, 1994
Details
Tight Performance Bounds on Greedy Policies Based on Imperfect Value Functions
Ronald J. Williams and Leemon C. Baird III, 1994
Details
Reinforcement Learning Applied to Linear Quadratic Regulation
Steven J. Bradtke, 1993
Details
Benchmarks, Test Beds, Controlled Experimentation, and the Design of Agent Architectures
Steve Hanks, Martha E. Pollack, and Paul R. Cohen, 1993
Details
Reinforcement learning with hidden states
Long-Ji Lin and Tom M. Mitchell, 1993
Details
An Optimization-based Categorization of Reinforcement Learning Environments
Michael L. Littman, 1993
Details
Overcoming Incomplete Perception with Utile Distinction Memory
R. Andrew McCallum, 1993
Details
Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time
Andrew W. Moore and Christopher G. Atkeson, 1993
Details
Efficient learning and planning within the Dyna framework
Jing Peng and Ronald J. Williams, 1993
Details
Approximating Q-Values with Basis Function Representations
Philip Sabes, 1993
Details
Online Learning with Random Representations
Richard S. Sutton and Steven D. Whitehead, 1993
Details
Multi-Agent Reinforcement Learning: Independent versus Cooperative Agents
Ming Tan, 1993
Details
Issues in Using Function Approximation for Reinforcement Learning
Sebastian Thrun and Anton Schwartz, 1993
Details
Interactions between Learning and Evolution
David Ackley and Michael Littman, 1992
Details
Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach
Lonnie Chrisman, 1992
Details
Inductive Biases in a Reinforcement Learner
Helen G. Cobb, 1992
Details
The Convergence of TD($łambda$) for General $łambda$
Peter Dayan, 1992
Details
Self-Improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching
Long-Ji Lin, 1992
Details
Practical Issues in Temporal Difference Learning
Gerald Tesauro, 1992
Details
Q-Learning
Christopher J. C. H. Watkins and Peter Dayan, 1992
Details
Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning
Ronald J. Williams, 1992
Details
Viability Theory
Jean-Pierre Aubin, 1991
Details
Intelligence without Representation
Rodney A. Brooks, 1991
Details
Cost-Sensitive Reinforcement Learning for Adaptive Classification and Control
Ming Tan, 1991
Details
Predicting Bank Failures in the 1980s
James B. Thomson, 1991
Details
A Proportional Hazards Model of Bank Failure: An Examination of its Usefulness as an Early Warning Tool
Gary Whalen, 1991
Details
Learning to perceive and act by trial and error
Steven D. Whitehead and Dana H. Ballard, 1991
Details
Learning Sequential Decision Rules Using Simulation Models and Competition
John J. Grefenstette, Connie Loggia Ramsey, and Alan C. Schultz, 1990
Details
Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming
Richard S. Sutton, 1990
Details
Multilayer feedforward networks are universal approximators
Kurt Hornik, Maxwell B. Stinchcombe, and Halbert White, 1989
Details
Restricted Subset Selection Procedures for Simulation
David W. Sullivan and James R. Wilson, 1989
Details
An algorithm for automated tsunami warning in French Polynesia based on mantle magnitudes
Jacques Talandier and Emile A. Okal, 1989
Details
Learning from Delayed Rewards
Christopher John Cornish Hellaby Watkins, 1989
Details
How Evaluation Guides AI Research: The Message Still Counts More than the Medium
Paul R. Cohen and Adele E. Howe, 1988
Details
Genetic algorithms in noisy environments
J. Michael Fitzpatrick and John J. Grefenstette, 1988
Details
Survey of model-based failure detection and isolation in complex plants
J. J. Gertler, 1988
Details
Quantifying Inductive Bias: AI Learning Algorithms and Valiant's Learning Framework
David Haussler, 1988
Details
Machine Learning as an Experimental Science
Pat Langley, 1988
Details
Learning to Predict By the Methods of Temporal Differences
Richard S. Sutton, 1988
Details
Further Real Applications of Markov Decision Processes
Douglas J. White, 1988
Details
Learning Quickly When Irrelevant Attributes Abound: A New Linear-threshold Algorithm
Nick Littlestone, 1987
Details
On Optimal Cooperation of Knowledge Sources - An Empirical Investigation
M. Benda, V. Jagannathan, and R. Dodhiawala, 1986
Details
Shift of Bias for Inductive Concept Learning
Paul E. Utgoff, 1986
Details
Bandit problems
Donald A. Berry and Bert Fristedt, 1985
Details
A procedure for selecting a subset of size $m$ containing the $l$ best of $k$ independent normal populations, with applications to simulation
Lloyd W. Koenig and Averill M. Law, 1985
Details
Asymptotically Efficient Adaptive Allocation Rules
T. L. Lai and Herbert Robbins, 1985
Details
Real Applications of Markov Decision Processes
Douglas J. White, 1985
Details
Neuronlike adaptive elements that can solve difficult learning control problems
Andrew G. Barto, Richard S. Sutton, and Charles W. Anderson, 1983
Details
A Survey of Partially Observable Markov Decision Processes: Theory, Models, and Algorithms
George E. Monahan, 1982
Details
Brains, Behavior and Robotics
James Sacra Albus, 1981
Details
The Need for Biases in Learning Generalizations
Tom M. Mitchell, 1980
Details
Early Warning Indicators of Business Failure
Subhash Sharma and Vijay Mahajan, 1980
Details
The Optimal Control of Partially Observable Markov Processes Over the Infinite Horizon: Discounted Costs
Edward J. Sondik, 1978
Details
Determining Sample Size for Pretesting Comparative Effectiveness of Advertising Copies
Siddhartha R. Dalal and V. Srinivasan, 1977
Details
Sequential models for clinical trials
Herman Chernoff, 1967
Details
Optimal Control of Markov Processes with Incomplete State Information
K. J. Åström, 1965
Details
A Sequential Procedure for Selecting the Population with the Largest Mean from $k$ Normal Populations
Edward Paulson, 1964
Details
Probability Inequalities for Sums of Bounded Random Variables
Wassily Hoeffding, 1963
Details
The Future of Data Analysis
John W. Tukey, 1962
Details
Comparing entries in random sample tests
W. A. Becker, 1961
Details
A Sequential Multiple-Decision Procedure for Selecting the Best One of Several Normal Populations with a Common Unknown Variance, and Its Use with Various Experimental Designs
Robert E. Bechhofer, 1958
Details
Dynamic Programming
Richard Bellman, 1957
Details
Some aspects of the sequential design of experiments
Herbert Robbins, 1952
Details
Sequential Analysis
Abraham Wald, 1947
Details
Contributions to the Theory of Sequential Analysis. I
M. A. Girshick, 1946
Details
Contributions to the Theory of Sequential Analysis, II, III
M. A. Girshick, 1946
Details