The action selection method is designed to make use of memory to
select the action most probable to succeed, and to fill memory when no
useful memories were available. For example, when the defender is at
position , the agent begins by retrieving
and
as described in Section 2.3.2. Then, it acts
according to the following function:
An action is only selected based on the memory values if these values
indicate that one action is likely to succeed and that it is better
than the other. If, on the other hand, neither value nor
indicate a positive likelihood of success, then an action
is chosen randomly. The only exception to this last rule is when
one of the values is zero,
suggesting that there has not
yet been any training examples for that action at that memory
location. In this case, there is a bias towards exploring the
untried action in order to fill out memory.