Recall that when the shooter started in the center of its range, it
would score using the simple shooting policy: it began moving when the
Ball Distance was 110 units or less. However, to get a diverse
training sample, we replaced this shooting policy with a random
shooting policy of the form ``at each opportunity, begin moving with
probability .'' To help choose x, we determined that
the shooter had about 25 decision opportunities before the ball moved
within 110 units of the Contact Point. Since we wanted the shooter to
start moving before or after these 25 decision cycles with roughly
equal probability so as to get a balanced training sample, we solved
the equation
.
Hence, when using the random shooting policy, the shooter started
moving with probability 1/37 at each decision point.
Using this shooting policy, we then collected training data. Each instance consisted of four numbers: the three inputs (Ball Distance, Agent Distance, and Heading Offset) at the time that the shooter began accelerating and a 1 or 0 to indicate whether the shot was successful or not. A shot was successful only if it went directly from the front of the shooter into the goal as illustrated in Figure 3(b): a trial was halted unsuccessfully if the ball hit any corner or side of the shooter, or if the ball hit any wall other than the goal.
Running 2990 trials in this manner gave us sufficient training data to learn to shoot a moving ball into the goal. The success rate using the random shooting policy was 19.7%. In particular, only 590 of the training examples were positive instances.