Peter Stone's Selected Publications

Classified by TopicClassified by Publication TypeSorted by DateSorted by First Author Last NameClassified by Funding Source


N-Agent Ad Hoc Teamwork

N-Agent Ad Hoc Teamwork.
Caroline Wang, Arrasy Rahman, Ishan Durugkar, Elad Liebman, and Peter Stone.
In Conference on Neural Information Processing Systems (NeurIPS), December 2024.

Download

[PDF]1.6MB  [slides.pdf]1.3MB  [poster.pdf]1.7MB  

Abstract

Current approaches to learning cooperative multi-agent behaviors assumerelatively restrictive settings. In standard fully cooperative multi-agentreinforcement learning, the learning algorithm controls all agents in thescenario, while in ad hoc teamwork, the learning algorithm usually assumescontrol over only a single agent in the scenario. However, many cooperativesettings in the real world are much less restrictive. For example, in anautonomous driving scenario, a company might train its cars with the samelearning algorithm, yet once on the road, these cars must cooperate with carsfrom another company. Towards expanding the class of scenarios that cooperativelearning methods may optimally address, we introduce N-agent ad hoc teamwork(NAHT), where a set of autonomous agents must interact and cooperate withdynamically varying numbers and types of teammates. This paper formalizes theproblem, and proposes the Policy Optimization with Agent Modelling (POAM)algorithm. POAM is a policy gradient, multi-agent reinforcement learning approachto the NAHT problem that enables adaptation to diverse teammate behaviors bylearning representations of teammate behaviors. Empirical evaluation on tasksfrom the multi-agent particle environment and StarCraft II shows that POAMimproves cooperative task returns compared to baseline approaches, and enablesout-of-distribution generalization to unseen teammates.

BibTeX Entry

@InProceedings{wang-naht-24,
  author   = {Caroline Wang and Arrasy Rahman and Ishan Durugkar and Elad Liebman and Peter Stone},
  title    = {N-Agent Ad Hoc Teamwork},
  booktitle = {Conference on Neural Information Processing Systems (NeurIPS)},
  year     = {2024},
  month    = {December},
  location = {Vancouver, Canada},
  abstract = {Current approaches to learning cooperative multi-agent behaviors assume
relatively restrictive settings. In standard fully cooperative multi-agent
reinforcement learning, the learning algorithm controls all agents in the
scenario, while in ad hoc teamwork, the learning algorithm usually assumes
control over only a single agent in the scenario. However, many cooperative
settings in the real world are much less restrictive. For example, in an
autonomous driving scenario, a company might train its cars with the same
learning algorithm, yet once on the road, these cars must cooperate with cars
from another company. Towards expanding the class of scenarios that cooperative
learning methods may optimally address, we introduce N-agent ad hoc teamwork
(NAHT), where a set of autonomous agents must interact and cooperate with
dynamically varying numbers and types of teammates. This paper formalizes the
problem, and proposes the Policy Optimization with Agent Modelling (POAM)
algorithm. POAM is a policy gradient, multi-agent reinforcement learning approach
to the NAHT problem that enables adaptation to diverse teammate behaviors by
learning representations of teammate behaviors. Empirical evaluation on tasks
from the multi-agent particle environment and StarCraft II shows that POAM
improves cooperative task returns compared to baseline approaches, and enables
out-of-distribution generalization to unseen teammates.
  },
}

Generated by bib2html.pl (written by Patrick Riley ) on Tue Nov 19, 2024 10:24:40