Peter Stone's Selected Publications

• Classified by Topic • Classified by Publication Type • Sorted by Date • Sorted by First Author Last Name • Classified by Funding Source •

Empirical Studies in Action Selection for Reinforcement Learning

Empirical Studies in Action Selection for Reinforcement Learning.
Shimon Whiteson, Matthew E. Taylor, and Peter Stone.
Adaptive Behavior, 15(1):33–50, March 2007.

Download

[PDF]828.6kB [postscript]1.5MB

Abstract

To excel in challenging tasks, intelligent agents need sophisticated mechanisms for action selection: they need policies that dictate what action to take in each situation. Reinforcement learning (RL) algorithms are designed to learn such policies given only positive and negative rewards. Two contrasting approaches to RL that are currently in popular use are temporal difference (TD) methods, which learn value functions, and evolutionary methods, which optimize populations of candidate policies. Both approaches have had practical successes but few studies have directly compared them. Hence, there are no general guidelines describing their relative strengths and weaknesses. In addition, there has been little cross-collaboration, with few attempts to make them work together or to apply ideas from one to the other. This article aims to address these shortcomings via three empirical studies that compare these methods and investigate new ways of making them work together. First, we compare the two approaches in a benchmark task and identify variations of the task that isolate factors critical to each method's performance. Second, we investigate ways to make evolutionary algorithms excel at on-line tasks by borrowing exploratory mechanisms traditionally used by TD methods. We present empirical results demonstrating a dramatic performance improvement. Third, we explore a novel way of making evolutionary and TD methods work together by using evolution to automatically discover good representations for TD function approximators. We present results demonstrating that this novel approach can outperform both TD and evolutionary methods alone.

BibTeX Entry

@Article{AB07,
	Author="Shimon Whiteson and Matthew E.\ Taylor and Peter Stone",
	title="Empirical Studies in Action Selection for Reinforcement Learning",
        journal="Adaptive Behavior",
	year="2007",
	volume="15",number="1",
	month="March",
	pages="33--50",
	abstract=" 
                  To excel in challenging tasks, intelligent agents
                  need sophisticated mechanisms for action selection:
                  they need policies that dictate what action to take
                  in each situation.  Reinforcement learning (RL)
                  algorithms are designed to learn such policies given
                  only positive and negative rewards.  Two contrasting
                  approaches to RL that are currently in popular use
                  are temporal difference (TD) methods, which learn
                  value functions, and evolutionary methods, which
                  optimize populations of candidate policies.  Both
                  approaches have had practical successes but few
                  studies have directly compared them.  Hence, there
                  are no general guidelines describing their relative
                  strengths and weaknesses.  In addition, there has
                  been little cross-collaboration, with few attempts
                  to make them work together or to apply ideas from
                  one to the other.  This article aims to address
                  these shortcomings via three empirical studies that
                  compare these methods and investigate new ways of
                  making them work together.
                  First, we compare the two approaches in a benchmark
                  task and identify variations of the task that
                  isolate factors critical to each method's
                  performance.  Second, we investigate ways to make
                  evolutionary algorithms excel at on-line tasks by
                  borrowing exploratory mechanisms traditionally used
                  by TD methods.  We present empirical results
                  demonstrating a dramatic performance improvement.
                  Third, we explore a novel way of making evolutionary
                  and TD methods work together by using evolution to
                  automatically discover good representations for TD
                  function approximators.  We present results
                  demonstrating that this novel approach can
                  outperform both TD and evolutionary methods alone.
	",	 
}

Generated by bib2html.pl (written by Patrick Riley ) on Wed Dec 10, 2025 13:48:56