• Classified by Topic • Classified by Publication Type • Sorted by Date • Sorted by First Author Last Name • Classified by Funding Source •
On-Line Evolutionary Computation for Reinforcement Learning in Stochastic Domains.
Shimon
Whiteson and Peter Stone.
In Proceedings of the Genetic and Evolutionary
Computation Conference, pp. 1577–84, July 2006.
GECCO 2006
[PDF]754.2kB [postscript]1.4MB
In reinforcement learning, an agent interacting with its environment strives to learn a policy that specifies, for each state it may encounter, what action to take. Evolutionary computation is one of the most promising approaches to reinforcement learning but its success is largely restricted to off-line scenarios. In on-line scenarios, an agent must strive to maximize the reward it accrues while it is learning. Temporal difference (TD) methods, another approach to reinforcement learning, naturally excel in on-line scenarios because they have selection mechanisms for balancing the need to search for better policies (exploration) with the need to accrue maximal reward (exploitation). This paper presents a novel way to strike this balance in evolutionary methods by borrowing the selection mechanisms used by TD methods to choose individual actions and using them in evolution to choose policies for evaluation. Empirical results in the mountain car and server job scheduling domains demonstrate that these techniques can substantially improve evolution's on-line performance in stochastic domains.
@InProceedings{GECCO06-shimon, author="Shimon Whiteson and Peter Stone", title="On-Line Evolutionary Computation for Reinforcement Learning in Stochastic Domains", booktitle="Proceedings of the Genetic and Evolutionary Computation Conference", month="July",year="2006", pages="1577-84", abstract={ In \emph{reinforcement learning}, an agent interacting with its environment strives to learn a policy that specifies, for each state it may encounter, what action to take. Evolutionary computation is one of the most promising approaches to reinforcement learning but its success is largely restricted to \emph{off-line} scenarios. In \emph{on-line} scenarios, an agent must strive to maximize the reward it accrues \emph{while it is learning}. \emph{Temporal difference} (TD) methods, another approach to reinforcement learning, naturally excel in on-line scenarios because they have selection mechanisms for balancing the need to search for better policies (\emph{exploration}) with the need to accrue maximal reward (\emph{exploitation}). This paper presents a novel way to strike this balance in evolutionary methods by borrowing the selection mechanisms used by TD methods to choose individual actions and using them in evolution to choose policies for evaluation. Empirical results in the mountain car and server job scheduling domains demonstrate that these techniques can substantially improve evolution's on-line performance in stochastic domains. }, wwwnote={<a href="http://www.sigevo.org/gecco-2006/">GECCO 2006</a>}, }
Generated by bib2html.pl (written by Patrick Riley ) on Tue Nov 19, 2024 10:24:45