Finding
Promising
Exploration Regions by Weighting Expected Navigation Costs, GMD
Technical
Report, Arbeitspapiere der GMD 987, April, 1996.
In many learning tasks, data-query is neither free nor of constant
cost. Often the cost of a query depends on the distance from the
current location in state space to the desired query point. Much
can be gained in these instances by keeping track of (1) the length of
the shortest path from each state to every other, and (2) the first
action to take on each of these paths. With this information, a
learning agent can efficiently explore its environment, calculating at
every step the action that will move it towards the region of greatest
estimated exploration benefit by balancing the exploration potential of
all reachable states encountered so far against their currently
estimated distances.