Peter Stone's Selected Publications

Classified by TopicClassified by Publication TypeSorted by DateSorted by First Author Last NameClassified by Funding Source


TAMER: Training an Agent Manually via Evaluative Reinforcement

TAMER: Training an Agent Manually via Evaluative Reinforcement.
W. Bradley Knox and Peter Stone.
In IEEE 7th International Conference on Development and Learning, August 2008.
ICDL-2008
Also available in IEEE Xplore, 9-12 Aug. 2008 Pages:292 - 297

Download

[PDF]1.1MB  [postscript]12.8MB  

Abstract

Though computers have surpassed humans at many tasks, especially computationally intensive ones, there are many tasks for which human expertise remains necessary and/or useful. For such tasks, it is desirable for a human to be able to transmit knowledge to a learning agent as quickly and effortlessly as possible, and, ideally, without any knowledge of the details of the agent's learning process. This paper proposes a general framework called Training an Agent Manually via Evaluative Reinforcement (TAMER) that allows a human to train a learning agent to perform a common class of complex tasks simply by giving scalar reward signals in response to the agent's observed actions. Specifically, in sequential decision making tasks, an agent models the human's reward function and chooses actions that it predicts will receive the most reward. Our novel algorithm is fully implemented and tested on the game Tetris. Leveraging the human trainers' feedback, the agent learns to clear an average of more than 50 lines by its third game, an order of magnitude faster than the best autonomous learning agents.

BibTeX Entry

@InProceedings{ICDL08-knox,
 author="W.~Bradley Knox and Peter Stone",
 title="{TAMER}: {T}raining an {A}gent {M}anually via {E}valuative {R}einforcement",
 booktitle="IEEE 7th International Conference on Development and Learning",
 month="August",
 year="2008",
 abstract={Though computers have surpassed humans at many tasks, especially
  computationally intensive ones, there are many tasks for which human
  expertise remains necessary and/or useful.  For such tasks, it is
  desirable for a human to be able to transmit knowledge to a learning
  agent as quickly and effortlessly as possible, and, ideally, without
  any knowledge of the details of the agent's learning process.  This
  paper proposes a general framework called Training an Agent
  Manually via Evaluative Reinforcement (TAMER) that allows a human to
  train a learning agent to perform a common class of complex tasks
  simply by giving scalar reward signals in response to the agent's
  observed actions.  Specifically, in sequential decision making tasks,
  an agent models the human's reward function and chooses actions that
  it predicts will receive the most reward.  Our novel algorithm is
  fully implemented and tested on the game Tetris.  Leveraging the human
  trainers' feedback, the agent learns to clear an average of more than
  50 lines by its third game, an order of magnitude faster than the best
  autonomous learning agents.},
 wwwnote={<a href="http://www.icdl08.org/">ICDL-2008</a><br>Also available in <a href="http://ieeexplore.ieee.org/xpl/tocresult.jsp?isnumber=4640795&isYear=2008&count=52&page=1&ResultStart=25">IEEE Xplore</a>, 9-12 Aug. 2008 Pages:292 - 297},
}

Generated by bib2html.pl (written by Patrick Riley ) on Sun Nov 24, 2024 20:24:56