Two
Methods
for
Hierarchy Learning in Reinforcement Environments, in From
Animals to
Animats 2: Proceedings of the Second International Conference on
Simulation
of Adaptive Behavior (SAB '92), 1992.
This paper describes two methods for hierarchically organizing temporal
behaviors. The first is more intuitive: grouping together common
sequences of events into single units so that they may be treated as
individual behaviors. This system immediately encounters
problems, however, because the units are binary, meaning the behaviors
must execute completely or not at all, and this hinders the
construction of good training algorithms. The system also runs
into difficulty when more than one unit is (or should be) active at the
same time. The second system is a hierarchy of transition values. This hierarchy
dynamically modifies the values that specify the degree to which one
unit should follow another. These values are continuous, allowing
the use of gradient descent during learning. Furthermore, many
units are active at the same time as part of the system's normal
functionings.