Matt Luciw, Vincent Graziano, Mark Ring, Jürgen Schmidhuber. Artificial
Curiosity with Planning for Autonomous Visual and Perceptual
Development. In Proc. Joint IEEE International Conference
on Development and Learning (ICDL) and on Epigenetic Robotics
(ICDL-EpiRob 2011), Frankfurt, 2011.
Abstract
Autonomous agents that learn from reward on high- dimensional visual
observations must learn to simplify the raw observations in both
space (i.e., dimensionality reduction) and time (i.e., prediction),
so that reinforcement learning becomes tractable and effective.
Training the spatial and temporal models requires an appropriate
sampling scheme, which cannot be hard- coded if the algorithm is to
be general. Intrinsic rewards are associated with samples that best
improve the agent’s model of the world. Yet the dynamic nature of an
intrinsic reward signal presents a major obstacle to successfully
realizing an efficient curiosity-drive. TD-based incremental
reinforcement learning approaches fail to adapt quickly enough to
effectively exploit the curiosity signal. In this paper, a novel
artificial curiosity system with planning is implemented, based on
developmental or continual learning principles. Least-squares policy
iteration is used with an agent’s internal forward model, to
efficiently assign values for maximizing combined external and
intrinsic reward. The properties of this system are illustrated in a
high- dimensional, noisy, visual environment that requires the agent
to explore. With no useful external value information early on, the
self-generated intrinsic values lead to actions that improve both
its spatial (perceptual) and temporal (cognitive) models. Curiosity
also leads it to learn how it could act to maximize external reward.