• Classified by Topic • Classified by Publication Type • Sorted by Date • Sorted by First Author Last Name • Classified by Funding Source •
N-Agent Ad Hoc Teamwork.
Caroline Wang, Arrasy
Rahman, Ishan Durugkar, Elad
Liebman, and Peter Stone.
In Conference on Neural Information Processing
Systems (NeurIPS), December 2024.
[PDF]1.6MB [slides.pdf]1.3MB [poster.pdf]1.7MB
Current approaches to learning cooperative multi-agent behaviors assumerelatively restrictive settings. In standard fully cooperative multi-agentreinforcement learning, the learning algorithm controls all agents in thescenario, while in ad hoc teamwork, the learning algorithm usually assumescontrol over only a single agent in the scenario. However, many cooperativesettings in the real world are much less restrictive. For example, in anautonomous driving scenario, a company might train its cars with the samelearning algorithm, yet once on the road, these cars must cooperate with carsfrom another company. Towards expanding the class of scenarios that cooperativelearning methods may optimally address, we introduce N-agent ad hoc teamwork(NAHT), where a set of autonomous agents must interact and cooperate withdynamically varying numbers and types of teammates. This paper formalizes theproblem, and proposes the Policy Optimization with Agent Modelling (POAM)algorithm. POAM is a policy gradient, multi-agent reinforcement learning approachto the NAHT problem that enables adaptation to diverse teammate behaviors bylearning representations of teammate behaviors. Empirical evaluation on tasksfrom the multi-agent particle environment and StarCraft II shows that POAMimproves cooperative task returns compared to baseline approaches, and enablesout-of-distribution generalization to unseen teammates.
@InProceedings{wang-naht-24, author = {Caroline Wang and Arrasy Rahman and Ishan Durugkar and Elad Liebman and Peter Stone}, title = {N-Agent Ad Hoc Teamwork}, booktitle = {Conference on Neural Information Processing Systems (NeurIPS)}, year = {2024}, month = {December}, location = {Vancouver, Canada}, abstract = {Current approaches to learning cooperative multi-agent behaviors assume relatively restrictive settings. In standard fully cooperative multi-agent reinforcement learning, the learning algorithm controls all agents in the scenario, while in ad hoc teamwork, the learning algorithm usually assumes control over only a single agent in the scenario. However, many cooperative settings in the real world are much less restrictive. For example, in an autonomous driving scenario, a company might train its cars with the same learning algorithm, yet once on the road, these cars must cooperate with cars from another company. Towards expanding the class of scenarios that cooperative learning methods may optimally address, we introduce N-agent ad hoc teamwork (NAHT), where a set of autonomous agents must interact and cooperate with dynamically varying numbers and types of teammates. This paper formalizes the problem, and proposes the Policy Optimization with Agent Modelling (POAM) algorithm. POAM is a policy gradient, multi-agent reinforcement learning approach to the NAHT problem that enables adaptation to diverse teammate behaviors by learning representations of teammate behaviors. Empirical evaluation on tasks from the multi-agent particle environment and StarCraft II shows that POAM improves cooperative task returns compared to baseline approaches, and enables out-of-distribution generalization to unseen teammates. }, }
Generated by bib2html.pl (written by Patrick Riley ) on Tue Nov 19, 2024 10:24:40