(logo by Janette Forte)
This is the official site of the UT Austin Villa 3D Simulation team from the Department of Computer Science at the University of Texas at Austin.

This web page provides supplementary material to the following article:

Overlapping Layered Learning

Patrick MacAlpine and Peter Stone

Published in Artificial Intelligence (AIJ) 254:21-43, Elsevier, January 2018.

The full article can be found here.


This page provides details on the optimization process of different behaviors for getting up, walking, and kicking which was a key component in UT Austin Villa winning the 2014 RoboCup 3D simulation competition. Results from the competition, including videos of game action, are linked off the UT Austin Villa homepage. The remainder of this page focuses only on the learning process of behaviors.

Layered Learning is a hierarchical machine learning paradigm that enables learning of complex behaviors by incrementally learning a series of sub-behaviors. A key feature of layered learning is that higher layers directly depend on the learned lower layers. In its original formulation, lower layers were frozen prior to learning higher layers. This work considers an extension to the paradigm that allows learning certain behaviors independently, and then later stitching them together by learning at the "seams" where their influences overlap.
Sequential Layered Learning (SLL): Hierarchical learning paradigm that enables learning of complex behaviors by incrementally learning a series of sub-behaviors (each learned sub-behavior is a layer in the learning progression). Higher layers depend on lower layers for learning. This dependence can include providing features for learning, such as seed values for parameters, as well as a previous learned layer’s behavior being incorporated into the learning task for the next layer to be learned. In its original formulation, layers are learned in a sequential bottom-up fashion and, after a layer is learned, it is frozen before beginning learning of the next layer.

Concurrent Layered Learning (CLL): Purposely does not freeze newly learned layers, but instead keeps them open during learning of subsequent layers. This is done so that learning may enter areas of the behavior search space that are closer to the combined layers’ optimum behavior as opposed to being confined to areas of the joint layer search space where the behaviors of previously learned layers are fixed. While concurrent layered learning does not restrict the search space in the way that freezing learned layers does, the increase in the search space’s dimdimensionality can make learning slower and more difficult.



Different paradigms for layered learning with layers or parts of layers being learned shown in red.

Overlapping Layered Learning seeks to find a tradeoff between freezing each layer once learning is complete and leaving previously learned layers open. It does so by keeping some, but not necessarily all, parts of previously learned layers open during learning of subsequent layers. The part of previously learned layers left open is the "overlap" with the next layer being learned.

Combining Independently Learned Behaviors (CILB): Two or more behaviors are learned independently in the same layer, but then are combined together for a joint behavior at the subsequent layer by relearning some subset of the behaviors’ parameters or "seam" between the behaviors. This scenario is best when subtask behaviors are too complex and/or potentially interfere with each other during learning, such that they must be learned independently, but ultimately need to work together for a combined task.

Partial Concurrent Layered Learning (PCLL): Only part, but not all, of a previously learned layer’s behavior parameters are left open when learning a subsequent layer with new parameters. The part of the previously learned layer’s parameters left open is the "seam" between the layers. Partial concurrent learning is beneficial if full concurrent learning unnecessarily increases the dimensionality of the search space to the point that it hinders learning, and completely freezing the previous layer diminishes the potential behavior of the layers working together.

Previous Learned Layer Refinement (PLLR): After a layer is learned and frozen, and then a subsequent layer is learned, part or all of the previously learned layer is then unfrozen and relearned to better work with the newly learned layer that is now fully or partially frozen. We consider re-optimizing a previously frozen layer under new conditions as a new learned layer behavior with the "seam" between behaviors being the unfrozen part of the previous learned layer. This scenario is useful when a subtask is required to be learned before the next subsequent task layer can be learned, but then refining or relearning the original learned task layer to better work with the newly learned layer provides a benefit.
For the 2014 RoboCup 3D simulation competition UT Austin Villa learned 19 layered behaviors and optimized in total over 500 parameters.



Different layered learning behaviors with the number of parameters optimized for each behavior shown in parentheses. Solid black arrows show number of learned and frozen parameters passed from previously learned layer behaviors, dashed red arrows show the number of overlapping parameters being passed and relearned from one behavior to another, and the dotted blue arrows show the number of parameter values being passed as seed values to be used in new parameters at the next layer of learning. Overlapping layers are colored with CILB layers in orange, PCLL in green, and PLLR in yellow.


Sample videos of learning different behaviors can be found below.


Getup_Front_Primitive and Getup_Back_Primitive Learned Behaviors

Getup from front and back behaviors before (left) and after (right) optimization.
Download videos: Getup Front mp4, Getup Back mp4






Learning Walk_GoToTarget and Walk_Sprint Behaviors

Agent navigating an obstacle course of targets it is told to move toward while executing the goToTarget optimization task. The agent's fitness is measured by how far/fast it can move toward each target (shown as a magenta dot on the field). It is penalized for any movement when told to stop and is also penalized if it falls over. This optimization was used to learn both the goToTarget and sprint walk parameter sets. The walk parameter set the agent is using is displayed above the agent: T (red) = goToTarget, S (yellow) = sprint.
Download video: mp4






Learning Walk_PositionToDribble Behavior

Agent dribbling the ball toward the goal from multiple starting points while executing the driveBallToGoal2 optimization task. The agent's fitness is measured by how far it can dribble each ball in 15 seconds toward the goal and is penalized if it dribbles the ball backwards. At the end of every 15 seconds the agent performs a set series of movements to check its stability and is penalized if it falls over. The optimization is run in simulation time which is much faster than real time. This optimization was used to learn the positioning walk parameter set. The walk parameter set the agent is using is displayed above the agent: T (red) = goToTarget, S (yellow) = sprint, P (cyan) = positioning.
Download video: mp4






Learning Walks Without Layered Learning

Robot attempts to transition between Dribble walk parameters (red 'D') and Fast walk parameters (yellow 'F') that were each learned in isolation. Agent is unstable and falls over when not using layered learning to learn transition between walks.
Download video: mp4




Learning Walk_ApproachToKick Behavior

Agent attempts to approach and stop at a fixed offset position from the ball as fast as possible without running into the ball. This optimization task was used to learn the approach walk parameter set. The walk parameter set the agent is using is displayed above the agent: T (red) = goToTarget, S (yellow) = sprint, A (orange) = approach.
Download video: mp4






Learning Kick_Long_Primitive

Agent is repeatedly beamed behind the ball at a fixed position and attempts to kick the ball. The agent's fitness is measured by how far it is able to kick the ball.
Download video: mp4






Learning Kick_Long_Behavior

Agent approaches the ball from different positions/angles and attempts to kick the ball. The agent's fitness is measured by how far it is able to kick the ball. This optimization task was used to integrate the approach walk parameter set with the kick through overlapping layered learning. The walk parameter set the agent is using is displayed above the agent: T (red) = goToTarget, S (yellow) = sprint, A (orange) = approach.
Download video: mp4






Dribbling and Kicking

Agent dribbles and kicks the ball using optimized behaviors for walking and kicking. The walk parameter set the agent is using is displayed above the agent: T (red) = goToTarget, S (yellow) = sprint, P (cyan) = positioning, A (orange) = approach.
Download video: mp4






Multiagent Kickoff_Kick_Behavior

One agent lightly touches the ball before the second agent kicks the ball into the goal during an indirect kickoff. The touch and kicks skills were learned independently, and then were optimized to work together through overlapping layered learning.
Download video: mp4






Multiagent Kickoff Failure

Learning a multiagent indirect kickoff behavior with one agent touching the ball before another kicks it fails when not using overlapping layered learning -- the agents interfere with each other during the optimization process.
Download video: mp4


For any questions, please contact Patrick MacAlpine.