Peter Stone's Selected Publications

Classified by TopicClassified by Publication TypeSorted by DateSorted by First Author Last NameClassified by Funding Source


Minimum Coverage Sets for Training Robust Ad Hoc Teamwork Agents

Minimum Coverage Sets for Training Robust Ad Hoc Teamwork Agents.
Arrasy Rahman, Jiaxun Cui, and Peter Stone.
In AAAI, February 2024.
the conference presentation

Download

[PDF]1.3MB  [slides.ppt]9.2MB  [poster.pdf]956.2kB  

Abstract

Robustly cooperating with unseen agents and human partners presents significantchallenges due to the diverse cooperative conventions these partners may adopt.Existing Ad Hoc Teamwork (AHT) methods address this challenge by training anagent with a population of diverse teammate policies obtained through maximizingspecific diversity metrics. However, prior heuristic-based diversity metrics donot always maximize the agent's robustness in all cooperative problems. In thiswork, we first propose that maximizing an AHT agent's robustness requires it toemulate policies in the minimum coverage set (MCS), the set of best-responsepolicies to any partner policies in the environment. We then introduce theL-BRDiv algorithm that generates a set of teammate policies that, when used forAHT training, encourage agents to emulate policies from the MCS. L-BRDiv works bysolving a constrained optimization problem to jointly train teammate policies forAHT training and approximating AHT agent policies that are members of the MCS. Weempirically demonstrate that L-BRDiv produces more robust AHT agents thanstate-of-the-art methods in a broader range of two-player cooperative problemswithout the need for extensive hyperparameter tuning for its objectives. Ourstudy shows that L-BRDiv outperforms the baseline methods by prioritizingdiscovering distinct members of the MCS instead of repeatedly finding redundantpolicies.

BibTeX Entry

@InProceedings{rahman_minimum_AAAI24,
  author   = {Arrasy Rahman and Jiaxun Cui and Peter Stone},
  title    = {Minimum Coverage Sets for Training Robust Ad Hoc Teamwork Agents},
  booktitle = {AAAI},
  year     = {2024},
  month    = {February},
  location = {Vancouver, Canada},
  abstract = {Robustly cooperating with unseen agents and human partners presents significant
challenges due to the diverse cooperative conventions these partners may adopt.
Existing Ad Hoc Teamwork (AHT) methods address this challenge by training an
agent with a population of diverse teammate policies obtained through maximizing
specific diversity metrics. However, prior heuristic-based diversity metrics do
not always maximize the agent's robustness in all cooperative problems. In this
work, we first propose that maximizing an AHT agent's robustness requires it to
emulate policies in the minimum coverage set (MCS), the set of best-response
policies to any partner policies in the environment. We then introduce the
L-BRDiv algorithm that generates a set of teammate policies that, when used for
AHT training, encourage agents to emulate policies from the MCS. L-BRDiv works by
solving a constrained optimization problem to jointly train teammate policies for
AHT training and approximating AHT agent policies that are members of the MCS. We
empirically demonstrate that L-BRDiv produces more robust AHT agents than
state-of-the-art methods in a broader range of two-player cooperative problems
without the need for extensive hyperparameter tuning for its objectives. Our
study shows that L-BRDiv outperforms the baseline methods by prioritizing
discovering distinct members of the MCS instead of repeatedly finding redundant
policies.
  },
  wwwnote={<a href="https://www.youtube.com/watch?v=5ebmxMpEsys">the conference presentation</a>},
}

Generated by bib2html.pl (written by Patrick Riley ) on Tue Nov 19, 2024 10:24:41