next up previous
Next: Memory Model Up: Learning Method Previous: Motivation

Experimental Setup

We simulate this situation by placing a ball and a stationary car acting as the ``teammate'' in specific places on the field. Then we place another car, the ``defender,'' in front of the goal. The defender moves in a small circle in front of the goal at some speed and begins at some random point along this circle. The learning agent (``you'') must take one of two possible actions: shoot straight towards the goal, or pass to the teammate so that the ball will rebound towards the goal. A snapshot of the experimental setup is shown graphically in Figure 1.

The task is essentially to learn two functions, each with one continuous input variable, namely the defender's position. Based on this position, which can be represented unambiguously as the angle at which it is facing, tex2html_wrap_inline252 , the agent tries to learn the probability of scoring when shooting, tex2html_wrap_inline254 , and the probability of scoring when passing, tex2html_wrap_inline256 .gif If these functions were learned completely, which would only be possible if the defender's motion were deterministic, then both functions would be binary partitions: tex2html_wrap_inline260 .gif That is, the agent would know without doubt for any given tex2html_wrap_inline252 whether a shot, a pass, both, or neither would achieve its goal. However, since the agent cannot have had experience for every possible tex2html_wrap_inline252 , and since the defender may not move at the same speed each time, the learned functions must be approximations: tex2html_wrap_inline270 .

In order to enable the agent to learn approximations to the functions tex2html_wrap_inline486 and tex2html_wrap_inline488 , we gave it a memory in which it could store its experiences and from which it could retrieve its current approximations tex2html_wrap_inline484 and tex2html_wrap_inline482 . We explored and developed appropriate methods of storing to and retrieving from memory and an algorithm for deciding what action to take based on the retrieved values.



next up previous
Next: Memory Model Up: Learning Method Previous: Motivation



Peter Stone
Mon Dec 11 15:42:40 EST 1995