Peter Stone's Selected Publications

• Classified by Topic • Classified by Publication Type • Sorted by Date • Sorted by First Author Last Name • Classified by Funding Source •

Comparing Evolutionary and Temporal Difference Methods for Reinforcement Learning

Comparing Evolutionary and Temporal Difference Methods for Reinforcement Learning.
Matthew Taylor, Shimon Whiteson, and Peter Stone.
In Proceedings of the Genetic and Evolutionary Computation Conference, pp. 1321–28, July 2006.
BEST PAPER AWARD at GECCO 2006

Download

[PDF]235.9kB [postscript]562.2kB

Abstract

Both genetic algorithms (GAs) and temporal difference (TD) methods have proven effective at solving reinforcement learning (RL) problems. However, since few rigorous empirical comparisons have been conducted, there are no general guidelines describing the methods' relative strengths and weaknesses. This paper presents the results of a detailed empirical comparison between a GA and a TD method in Keepaway, a standard RL benchmark domain based on robot soccer. In particular, we compare the performance of NEAT \citestanley:ec02evolving, a GA that evolves neural networks, with Sarsa \citeRummery94,Singh96, a popular TD method. The results demonstrate that NEAT can learn better policies in this task, though it requires more evaluations to do so. Additional experiments in two variations of Keepaway demonstrate that Sarsa learns better policies when the task is fully observable and NEAT learns faster when the task is deterministic. Together, these results help isolate the factors critical to the performance of each method and yield insights into their general strengths and weaknesses.

BibTeX Entry

@InProceedings{GECCO06-matt,
	author="Matthew Taylor and Shimon Whiteson and Peter Stone",
	title="Comparing Evolutionary and Temporal Difference Methods for Reinforcement Learning",
	booktitle="Proceedings of the Genetic and Evolutionary Computation Conference",
	month="July",year="2006",
	pages="1321--28",
	abstract={
                  Both genetic algorithms (GAs) and temporal
                  difference (TD) methods have proven effective at
                  solving reinforcement learning (RL) problems.
                  However, since few rigorous empirical comparisons
                  have been conducted, there are no general guidelines
                  describing the methods' relative strengths and
                  weaknesses.  This paper presents the results of a
                  detailed empirical comparison between a GA and a TD
                  method in Keepaway, a standard RL benchmark domain
                  based on robot soccer.  In particular, we compare
                  the performance of NEAT~\cite{stanley:ec02evolving},
                  a GA that evolves neural networks, with
                  Sarsa~\cite{Rummery94,Singh96}, a popular TD method.
                  The results demonstrate that NEAT can learn better
                  policies in this task, though it requires more
                  evaluations to do so.  Additional experiments in two
                  variations of Keepaway demonstrate that Sarsa learns
                  better policies when the task is fully observable
                  and NEAT learns faster when the task is
                  deterministic.  Together, these results help isolate
                  the factors critical to the performance of each
                  method and yield insights into their general
                  strengths and weaknesses.
	},
        wwwnote={<b>BEST PAPER AWARD</b> at <a href="http://www.sigevo.org/gecco-2006/">GECCO 2006</a>},
}

Generated by bib2html.pl (written by Patrick Riley ) on Fri Apr 17, 2026 17:16:26