For challenge responses that address the machine learning issue (particularly the on-line learning issue), evaluation should be both against the publicly available teams and against at least one previously unseen team.
First, teams will play games against other teams and publicly available teams under normal circumstances. This evaluates the team's general performance. This involves both AI-based and non-AI based teams.
Next, teams will play a set of defined benchmarks. For example, after fixing their programs, challengers must play a part of the game, starting from the defined player positions, with the movement of the opponents pre-defined, but not disclosed to the challengers. After several sequences of the game, the performance will be evaluated to see if it was able to improve with experience. The movement of the opponents are not coded using absolute coordinate positions, but as a set of algorithms which generates motion sequences. The opponent algorithms will be provided by the organizers of the challenge by withholding at least one successful team from being publicly accessible.
Other benckmarks which will clearly evaluate learning performance will be announced after discussion with challenge participants.