Our research in this area focuses on methods for evolving Neural Networks with Genetic Algorithms, i.e. Evolutionary Reinforcement Learning, or Neuroevolution. Compared to the standard Reinforcement Learning , Neuro-Evolution is often more robust against noisy and incomplete input, and allows representing continuous states and actions naturally. Our methods include utilizing subpopulations, population statistics, and knowledge in the population, and evolving network structure. Much of this research involves comparisons of neuroevolution to traditional methods in several benchmark tasks such as pole balancing and mobile robot control.
This research is supported in part by the National Science Foundation under grant IIS-0083776 (and previously under IRI-9504317) and the Texas Higher Education Coordinating Board under grant ARP-003658-476-2001. Most of our projects are described below; for more details and for other projects, see publications in Neuroevolution Methods. For related projects, see Neuroevolution Applications and Reinforcement Learning .
Many neuroevolution methods evolve fixed-topology networks. Some methods evolve topologies in addition to weights, but these usually have a bound on the complexity of networks that can be evolved and begin evolution with random topologies. This project is based on a neuroevolution method called NeuroEvolution of Augmenting Topologies (NEAT) that can evolve networks of unbounded complexity from a minimal starting point. The initial stage of research aims to demonstrate that topology can be used to increase the efficiency of search if it minimizes the dimensionality of the weight space. We performed several pole balancing experiments that demonstrate that evolving topology using NEAT indeed provides an advantage. However, the research has a broader goal of showing that evolving topologies is necessary to achieve 3 major goals of neuroevolution: (1) Continual coevolution: Successful competitive coevolution can use the evolution of topologies to continuously elaborate strategies. (2) Evolution of Adaptive Networks: The evolution of topologies allows neuroevolution to evolve adaptive networks with plastic synapses by designating which connections should be adaptive and in what ways. (3) Combining Expert Networks: Separate expert neural networks can be fused through the evolution of connecting neurons between them. Because we want to show that growing structure is necessary to achieve these goals, it is important that an efficient and principled method for evolving topologies is available for experimentation. NEAT provides just such an experimental platform. NEAT is also an important contribution to GAs because it shows how it is possible for evolution to both optimize and complexify solutions simultaneously, making it possible to evolve increasingly complex solutions over time, thereby strengthening the analogy with biological evolution.
Most sequential decisions tasks in the real world, such as
manufacturing and robot control short-term memory.
Such controllers are difficult to design by traditional engineering
or even conventional reinforcement learning methods
because the environments are often non-linear, high-dimensional,
stochastic, and non-stationary. Evolutionary methods can potentially
solve these difficult problems but like these other approachs require that
solutions be evaluated in simulation and then transferred to the
real world.
In order to successfully apply evolution to these task two components
are required: (1) a learning method powerful enough to solve
problems of this difficulty in simulation and (2) an methodology
that facilitates trasnfer to the real world.
The Enforced Subpopulations (ESP) method can be extended to evolving
multiple networks simultaneously, and applied to multi-agent problem
solving tasks. In the prey capture domain, multiple predators evolved to
perform different and compatible roles, so that the whole team of
predators efficiently captured the prey. Remarkably, multi-agent
evolution was more efficient than evolving a central controller for the
task. Also, the predators did not need to communicate or even know the
other predators' locations; role-based cooperation was highly efficient
in this task. Communication would result in more general, but less
effective, behavior. These results suggest that multi-agent
neuroevolution is a promising approach for complex real-world tasks.
We are currently working on applying it on robotic soccer and other multi-agent games.
Any transmission of behavior from one generation to the next via a
non-genetic means is a process of culture. Culture provides major
advantages for survival in the biological world. In this project, four
methods were developed to harness the mechanisms of culture in
neuroevolution: culling overlarge litters, mate selection by
complementary competence, phenotypic diversity maintenance, and teaching
offspring to respond like an elder. The methods are efficient because
they operate without requiring additional fitness evaluations, and
because each method addresses a different aspect of neuroevolution, they
also combine synergetically. The combined system balances diversity and
selection pressure, and improves performance both in terms of learning
speed and solution quality in sequential decision tasks.
Although neuroevolution is powerful in discovering competent
neurocontrollers, it is difficult to achieve (1) high accuracy, and (2)
on-line adaptation to changes in the environment. In this project,
local adaptation using Particle Swarming is shown to solve both
problems. A competent neurocontroller is first evolved, and a population
consisting of slight modifications to it is then formed. This population
is further adapted as a swarm, allowing fine tuning and on-line response
to changes in the environment.
In standard neuroevolution, the goal is to evolve a single neural
network that is often able to compute a desired answer. The method of
confidence attempts to extract even better answers from the entire
population. One way to do this is do evolve networks that output not
only their answer, but also an estimate of that answer's correctness.
Experimental results in the handwritten digit recognition domain
suggest that such an evolutionary process, combined with an effective
technique for speciation, can create a population of networks that
performs better than any individual network.
In standard evolutionary algorithms, new individuals are generated by
random mutation and recombination. In Eugenic Evolution, individuals are
systematically constructed to maximize fitness, based on historical data
on correlations between allele and fitness values. This method, Eugenic
Algorithm (EuA), compares favorably to standard methods such as
Simulated Annealing and Genetic Algorithms in general combinatorial
optimization tasks. The Eugenic principle has also been applied to the
evolution of neural networks in a method called EuSANE, where new
networks are systematically constructed from a pool of candidate
neurons. The EuA principle is further enhanced in the TEAM method, where
statistical models for each gene are individually maintained.
In this project we developed an Evolutionary Reinforcement Learning
method called SANE (Symbiotic, Adaptive Neuro-Evolution) where a
population of neurons is evolved to form a neural network for a
sequential decision task. Symbiotic evolution promotes both cooperation
and specialization in the population, which results in a fast, efficient
genetic search and discourages convergence to suboptimal solutions.
SANE was shown to be faster and more powerful than other reinforcement
learning methods in the pole-balancing and mobile robot benchmark tasks,
leading to several novel applications.
(Faustino Gomez, since 1996)
I have developed a neuroevolution algorithm,
Enforced SubPopulations (ESP), that extends SANE
by allowing neurons to evolve recurrent connections and, therefore,
use information about past experience (i.e. memory) to make
decisions.
Because of
sensory limitations, it is not always possible for the control system
to identify the state directly; instead, the system must make use of
its perceptual history to disambiguate the state. Conventional
learning methods such as Q-learning do not work well in such
non-Markov environments. However, neuro-evolution has recently shown
to be a very promising alternative. In this work I explore an
approach for solving continuous, non-Markov control tasks that is
composed of two separate parts: (1) A neuro-evolution approach,
Enforced SubPopulations (ESP), that extends SANE
by allowing neurons to evolve recurrent connections and, therefore,
use information about past experience (i.e. memory) to make
decisions. (2) An Incremental Evolution approach that allows
evolutionary methods to solve hard tasks by evolving on a sequence of
increasingly difficult tasks. The method has been tested on several
Markov and non-Markov versions of the pole balancing problem, as well as
on evolving general behavior in the prey capture task. The
results show that ESP with Incremental Evolution is more efficient than
other methods and can solve harder versions of the tasks.
Because it is impractical to evaluate entire populations of
controllers in the real world, Evolutionary approaches are just as
dependent on simulation as other reinforcement learning methods.
Controllers must first be learned off-line in a simulator or {\em
simulation environment} and then be transferred to the actual {\em
target environment} where it is ultimately meant to operate. requires
solutions to be discovered in simulation and then transferred to the
real world. To ensure that transfer is possible, evolved controllers
need to be robust enough to cope with discrepancies between these two
settings.
So far, transfer of evolved mobile robot controllers has been shown to
be possible, but there is very little research on transfer in other
classes of tasks, such as the control of unstable systems. The
second goal of this paper is to analyze what factors influence
transfer and show that transfer is possible even in high-precision
tasks in unstable environments, such as the most difficult pole
balancing task.
However, no matter how rigorously they are developed,
simulators cannot faithfully model all aspects of a target
environment. Whenever the target environment is abstracted in some
way to simplify evaluation, spurious features are introduced into the
simulation. If a controller relies on these features to accomplish
the task, it will fail to transfer to the real world where the
features are not available~\cite{mataric:ras96}. Since some
abstraction is necessary to make simulators tractable, such a
``reality gap'' can prevent controllers from performing in the
physical world as they do in simulation.
(Chern Han Yong, Shimon Whiteson, Nate Kohl,
Bobby Bryant, since 2000)
(Paul McQuesten,
1998-2002)
(Alex Conradie,
2001-2002)
(Joseph Bruce, since 2000)
(John Prior, Daniel
Polani, Aard-Jan van Kesteren, and Matt Alden, since 1998)
(David Moriarty, 1994;1997)
In a marker-based encoding of a neural network, each neuron definition consists of a collection of connections specified between a start and an end marker in the chromosome. This mechanism allows all aspects of the network structure, including the number of nodes and their connectivity, to be evolved through genetic algorithms. The search is free to utilize material between neuron definitions, which allows for drastic exploration of solutions space. The method has been shown efficient in learning finite state behavior in an artificial environment and learning strategies for the game of Othello.