We began our experimentation with the ball always being passed with
the same trajectory and the same speed for all training and testing
examples. With this condition of fixed ball motion, the shooter
could always aim at the same point wide of the goal, guaranteeing that
if contact was made, the ball would be propelled in the right
direction. That is to say, the shooter used a constant aiming policy.
We determined that with the trajectory ( ) and speed (
units/sec) of the ball we were initially using, the shooter would
score when contacting the ball if its steering line was such that it
aimed 170 units wide of the center of the goal (illustrated in
Figure 3(b)). This point remains constant throughout this
section and Section
4.2.
Before setting up any learning experiments, we found a simple fixed shooting policy that would allow the shooter to score consistently when starting at the exact center of its range of initial positions. Starting at this position, the shooter could score consistently if it began accelerating when the ball's distance to its projected point of intersection with the agent's path reached 110 units or less. We call this policy the simple shooting policy. However, this simple policy was clearly not appropriate for the entire range of shooter positions that we considered: when using this policy while starting at random positions, the shooter scored only 60.8% of the time.