Research on Neuroevolution Methods

In difficult real-world learning tasks such as controlling robots, playing games, or pursuing or evading an enemy, there are no direct targets that would specify correct actions for each situation. In such problems, optimal behavior must be learned by exploring different actions, and assigning credit for good decisions based on sparse reinforcement feedback.

Our research in this area focuses on methods for evolving Neural Networks with Genetic Algorithms, i.e. Evolutionary Reinforcement Learning, or Neuroevolution. Compared to the standard Reinforcement Learning , Neuro-Evolution is often more robust against noisy and incomplete input, and allows representing continuous states and actions naturally. Our methods include utilizing subpopulations, population statistics, and knowledge in the population, and evolving network structure. Much of this research involves comparisons of neuroevolution to traditional methods in several benchmark tasks such as pole balancing and mobile robot control.

This research is supported in part by the National Science Foundation under grant IIS-0083776 (and previously under IRI-9504317) and the Texas Higher Education Coordinating Board under grant ARP-003658-476-2001. Most of our projects are described below; for more details and for other projects, see publications in Neuroevolution Methods. For related projects, see Neuroevolution Applications and Reinforcement Learning .

NEAT: Evolving Neural Network Topologies
(Kenneth Stanley, since 2000)
Many neuroevolution methods evolve fixed-topology networks. Some methods evolve topologies in addition to weights, but these usually have a bound on the complexity of networks that can be evolved and begin evolution with random topologies. This project is based on a neuroevolution method called NeuroEvolution of Augmenting Topologies (NEAT) that can evolve networks of unbounded complexity from a minimal starting point. The initial stage of research aims to demonstrate that topology can be used to increase the efficiency of search if it minimizes the dimensionality of the weight space. We performed several pole balancing experiments that demonstrate that evolving topology using NEAT indeed provides an advantage. However, the research has a broader goal of showing that evolving topologies is necessary to achieve 3 major goals of neuroevolution: (1) Continual coevolution: Successful competitive coevolution can use the evolution of topologies to continuously elaborate strategies. (2) Evolution of Adaptive Networks: The evolution of topologies allows neuroevolution to evolve adaptive networks with plastic synapses by designating which connections should be adaptive and in what ways. (3) Combining Expert Networks: Separate expert neural networks can be fused through the evolution of connecting neurons between them. Because we want to show that growing structure is necessary to achieve these goals, it is important that an efficient and principled method for evolving topologies is available for experimentation. NEAT provides just such an experimental platform. NEAT is also an important contribution to GAs because it shows how it is possible for evolution to both optimize and complexify solutions simultaneously, making it possible to evolve increasingly complex solutions over time, thereby strengthening the analogy with biological evolution.
- On the NEAT Method: Evolutionary Computation Journal Paper and a Conference Paper (Best Paper Award Winner at GECCO-2002) that goes into discussion of the way NEAT performs search.
  A shorter second Conference Paper on improving NE efficiency with evolution of topologies, emphasizing ablation studies.
- NEAT User Information: NEAT Users Page includes a FAQ.
- On the benefits of Complexification: A Conference Paper on using the complexification of networks to enhance the performance of competitive coevolution. A more extensive and general Tech Report (new) about complexification. A Competitive Coevolution Demo includes animated GIF movie clips of simulated robot controllers coevolved using NEAT.
- Software: NEAT Software available in both C++ and Java
- On Analyzing Results: Our papers on complexification utilized a method that we developed for monitoring progress in coevolution called Dominance Tournament.
- Artificial Embryogeny (New): Journal Paper on evolving structures that develop from a single cell. A shorter symposium paper on combining complexification with indirect encodings.
Solving Non-Markov Control Tasks
(Faustino Gomez, since 1996)
Most sequential decisions tasks in the real world, such as manufacturing and robot control short-term memory. Such controllers are difficult to design by traditional engineering or even conventional reinforcement learning methods because the environments are often non-linear, high-dimensional, stochastic, and non-stationary. Evolutionary methods can potentially solve these difficult problems but like these other approachs require that solutions be evaluated in simulation and then transferred to the real world. In order to successfully apply evolution to these task two components are required: (1) a learning method powerful enough to solve problems of this difficulty in simulation and (2) an methodology that facilitates trasnfer to the real world.

I have developed a neuroevolution algorithm, Enforced SubPopulations (ESP), that extends SANE by allowing neurons to evolve recurrent connections and, therefore, use information about past experience (i.e. memory) to make decisions. Because of sensory limitations, it is not always possible for the control system to identify the state directly; instead, the system must make use of its perceptual history to disambiguate the state. Conventional learning methods such as Q-learning do not work well in such non-Markov environments. However, neuro-evolution has recently shown to be a very promising alternative. In this work I explore an approach for solving continuous, non-Markov control tasks that is composed of two separate parts: (1) A neuro-evolution approach, Enforced SubPopulations (ESP), that extends SANE by allowing neurons to evolve recurrent connections and, therefore, use information about past experience (i.e. memory) to make decisions. (2) An Incremental Evolution approach that allows evolutionary methods to solve hard tasks by evolving on a sequence of increasingly difficult tasks. The method has been tested on several Markov and non-Markov versions of the pole balancing problem, as well as on evolving general behavior in the prey capture task. The results show that ESP with Incremental Evolution is more efficient than other methods and can solve harder versions of the tasks.

Because it is impractical to evaluate entire populations of controllers in the real world, Evolutionary approaches are just as dependent on simulation as other reinforcement learning methods. Controllers must first be learned off-line in a simulator or {\em simulation environment} and then be transferred to the actual {\em target environment} where it is ultimately meant to operate. requires solutions to be discovered in simulation and then transferred to the real world. To ensure that transfer is possible, evolved controllers need to be robust enough to cope with discrepancies between these two settings. So far, transfer of evolved mobile robot controllers has been shown to be possible, but there is very little research on transfer in other classes of tasks, such as the control of unstable systems. The second goal of this paper is to analyze what factors influence transfer and show that transfer is possible even in high-precision tasks in unstable environments, such as the most difficult pole balancing task.

However, no matter how rigorously they are developed, simulators cannot faithfully model all aspects of a target environment. Whenever the target environment is abstracted in some way to simplify evaluation, spurious features are introduced into the simulation. If a controller relies on these features to accomplish the task, it will fail to transfer to the real world where the features are not available~\cite{mataric:ras96}. Since some abstraction is necessary to make simulators tractable, such a ``reality gap'' can prevent controllers from performing in the physical world as they do in simulation.
- Papers on Non-Markov Double Pole Balancing and 2-D Pole Balancing
- Paper on ESP and Incremental Evolution in the prey capture task.
- Paper on Transfer.
- Double Pole Balancing Demo
- ESP software package
Cooperative Coevolution of Multi-Agent Systems
(Chern Han Yong, Shimon Whiteson, Nate Kohl, Bobby Bryant, since 2000)
The Enforced Subpopulations (ESP) method can be extended to evolving multiple networks simultaneously, and applied to multi-agent problem solving tasks. In the prey capture domain, multiple predators evolved to perform different and compatible roles, so that the whole team of predators efficiently captured the prey. Remarkably, multi-agent evolution was more efficient than evolving a central controller for the task. Also, the predators did not need to communicate or even know the other predators' locations; role-based cooperation was highly efficient in this task. Communication would result in more general, but less effective, behavior. These results suggest that multi-agent neuroevolution is a promising approach for complex real-world tasks. We are currently working on applying it on robotic soccer and other multi-agent games.
- Paper on the Coevolution of Predators and Prey
- Paper on Evolving Soccer Players to play Keepaway
Utilizing Population Culture in Neuroevolution
(Paul McQuesten, 1998-2002)
Any transmission of behavior from one generation to the next via a non-genetic means is a process of culture. Culture provides major advantages for survival in the biological world. In this project, four methods were developed to harness the mechanisms of culture in neuroevolution: culling overlarge litters, mate selection by complementary competence, phenotypic diversity maintenance, and teaching offspring to respond like an elder. The methods are efficient because they operate without requiring additional fitness evaluations, and because each method addresses a different aspect of neuroevolution, they also combine synergetically. The combined system balances diversity and selection pressure, and improves performance both in terms of learning speed and solution quality in sequential decision tasks.
- Short paper on culling and teaching
- Dissertation
Refinement and on-line adaptation of neurocontrollers through particle swarming
(Alex Conradie, 2001-2002)
Although neuroevolution is powerful in discovering competent neurocontrollers, it is difficult to achieve (1) high accuracy, and (2) on-line adaptation to changes in the environment. In this project, local adaptation using Particle Swarming is shown to solve both problems. A competent neurocontroller is first evolved, and a population consisting of slight modifications to it is then formed. This population is further adapted as a swarm, allowing fine tuning and on-line response to changes in the environment.
- Paper on refinement
- Paper on on-line adaptation
Evolving Confident Neural Networks
(Joseph Bruce, since 2000)
In standard neuroevolution, the goal is to evolve a single neural network that is often able to compute a desired answer. The method of confidence attempts to extract even better answers from the entire population. One way to do this is do evolve networks that output not only their answer, but also an estimate of that answer's correctness. Experimental results in the handwritten digit recognition domain suggest that such an evolutionary process, combined with an effective technique for speciation, can create a population of networks that performs better than any individual network.
- Paper
Eugenic Evolution: The EuA, EuSANE, and TEAM
(John Prior, Daniel Polani, Aard-Jan van Kesteren, and Matt Alden, since 1998)
In standard evolutionary algorithms, new individuals are generated by random mutation and recombination. In Eugenic Evolution, individuals are systematically constructed to maximize fitness, based on historical data on correlations between allele and fitness values. This method, Eugenic Algorithm (EuA), compares favorably to standard methods such as Simulated Annealing and Genetic Algorithms in general combinatorial optimization tasks. The Eugenic principle has also been applied to the evolution of neural networks in a method called EuSANE, where new networks are systematically constructed from a pool of candidate neurons. The EuA principle is further enhanced in the TEAM method, where statistical models for each gene are individually maintained.
- Papers on EuA, EuSane, and TEAM
- Demos of EuA and TEAM
Symbiotic Evolution: The SANE System
(David Moriarty, 1994;1997)
In this project we developed an Evolutionary Reinforcement Learning method called SANE (Symbiotic, Adaptive Neuro-Evolution) where a population of neurons is evolved to form a neural network for a sequential decision task. Symbiotic evolution promotes both cooperation and specialization in the population, which results in a fast, efficient genetic search and discourages convergence to suboptimal solutions. SANE was shown to be faster and more powerful than other reinforcement learning methods in the pole-balancing and mobile robot benchmark tasks, leading to several novel applications.
- Papers: Overview, Comprehensive, Pole Balancing comparisons.
- JavaSANE Software (for applying SANE to new tasks; Cyndy Matuszek, 1998)
- SANE-C Software (predecessor of JavaSANE)
- Software for the Pole Balancing Benchmark task
- SANE Pole Balancing Demo (java applet; Chris Lewis and Jeff Lawson 1997)
Marker-Based Encoding of Neural Networks
(Brad Fullmer, David Moriarty, 1991;1995)
In a marker-based encoding of a neural network, each neuron definition consists of a collection of connections specified between a start and an end marker in the chromosome. This mechanism allows all aspects of the network structure, including the number of nodes and their connectivity, to be evolved through genetic algorithms. The search is free to utilize material between neuron definitions, which allows for drastic exploration of solutions space. The method has been shown efficient in learning finite state behavior in an artificial environment and learning strategies for the game of Othello.
- Finite-State Behavior Paper
- Othello Paper

Back to Research Projects
Back to UTCS Neural Networks home page

risto@cs.utexas.edu

Last update: 1.42 2003/03/30 20:30:13 nate