In this section, we present the results of our experiments. We begin by finding an appropriate memory size to use for this task. Then we explore our agent's ability to learn time-varying and nondeterministic defender behavior, introducing a more sophisticated memory storage technique.
While examining the results, keep in mind that even if the agent used
the functions and
to decide whether to shoot or to
pass, the success rate would be significantly less than 100% (it
would differ for different defender speeds): there were many defender
starting positions for which neither shooting nor passing led to a
goal (see Figure 2).
Figure 2: For different defender starting positions (solid rectangle), the agent can
score when a) shooting, b) passing, c) neither, or d) both.
For example, from our experiments with the defender moving at a
constant speed of 50, we found
that an agent acting optimally scores 73.6% of the time; an agent
acting randomly scores only 41.3% of the time. These values set good
reference points for evaluating our learning agent's performance. We
indicate the scoring rate of an optimally acting agent on our graphs.
In order to increase the optimal success rate, we also experimented
with allowing the agent not to act when for a given
defender position, i.e. when neither shooting nor passing was likely
to work. The agent then scored 100% of the time by waiting
until the defender moved into a position in which scoring was
possible. In this setup, however, we had a difficult time collecting
meaningful results, since the agent learned how to score when the
defender was in a single position and then only acted when it was near
that position. Therefore, we required the agent to act immediately.