• Classified by Topic • Classified by Publication Type • Sorted by Date • Sorted by First Author Last Name • Classified by Funding Source •
TAMER: Training an Agent Manually via Evaluative Reinforcement.
W. Bradley
Knox and Peter Stone.
In IEEE 7th International Conference on Development
and Learning, August 2008.
ICDL-2008
Also available in IEEE
Xplore, 9-12 Aug. 2008 Pages:292 - 297
Though computers have surpassed humans at many tasks, especially computationally intensive ones, there are many tasks for which human expertise remains necessary and/or useful. For such tasks, it is desirable for a human to be able to transmit knowledge to a learning agent as quickly and effortlessly as possible, and, ideally, without any knowledge of the details of the agent's learning process. This paper proposes a general framework called Training an Agent Manually via Evaluative Reinforcement (TAMER) that allows a human to train a learning agent to perform a common class of complex tasks simply by giving scalar reward signals in response to the agent's observed actions. Specifically, in sequential decision making tasks, an agent models the human's reward function and chooses actions that it predicts will receive the most reward. Our novel algorithm is fully implemented and tested on the game Tetris. Leveraging the human trainers' feedback, the agent learns to clear an average of more than 50 lines by its third game, an order of magnitude faster than the best autonomous learning agents.
@InProceedings{ICDL08-knox, author="W.~Bradley Knox and Peter Stone", title="{TAMER}: {T}raining an {A}gent {M}anually via {E}valuative {R}einforcement", booktitle="IEEE 7th International Conference on Development and Learning", month="August", year="2008", abstract={Though computers have surpassed humans at many tasks, especially computationally intensive ones, there are many tasks for which human expertise remains necessary and/or useful. For such tasks, it is desirable for a human to be able to transmit knowledge to a learning agent as quickly and effortlessly as possible, and, ideally, without any knowledge of the details of the agent's learning process. This paper proposes a general framework called Training an Agent Manually via Evaluative Reinforcement (TAMER) that allows a human to train a learning agent to perform a common class of complex tasks simply by giving scalar reward signals in response to the agent's observed actions. Specifically, in sequential decision making tasks, an agent models the human's reward function and chooses actions that it predicts will receive the most reward. Our novel algorithm is fully implemented and tested on the game Tetris. Leveraging the human trainers' feedback, the agent learns to clear an average of more than 50 lines by its third game, an order of magnitude faster than the best autonomous learning agents.}, wwwnote={<a href="http://www.icdl08.org/">ICDL-2008</a><br>Also available in <a href="http://ieeexplore.ieee.org/xpl/tocresult.jsp?isnumber=4640795&isYear=2008&count=52&page=1&ResultStart=25">IEEE Xplore</a>, 9-12 Aug. 2008 Pages:292 - 297}, }
Generated by bib2html.pl (written by Patrick Riley ) on Tue Nov 19, 2024 10:24:45