Peter Stone's Selected Publications

• Classified by Topic • Classified by Publication Type • Sorted by Date • Sorted by First Author Last Name • Classified by Funding Source •

Combining Manual Feedback with Subsequent MDP Reward Signals for Reinforcement Learning

Combining Manual Feedback with Subsequent MDP Reward Signals for Reinforcement Learning.
W. Bradley Knox and Peter Stone.
In Proc. of 9th Int. Conf. on Autonomous Agents and Multiagent Systems (AAMAS 2010), May 2010.
Winner of the Pragnesh Jay Modi BEST STUDENT PAPER AWARD (and best paper award nominee).
The TAMER project page with videos of TAMER in action.
AAMAS-2010

Download

[PDF]422.9kB [postscript]3.5MB

Abstract

As learning agents move from research labs to the real world, it is increasingly important that human users, including those without programming skills, be able to teach agents desired behaviors. Recently, the TAMER framework was introduced for designing agents that can be interactively shaped by human trainers who give only positive and negative feedback signals. Past work on TAMER showed that shaping can greatly reduce the sample complexity required to learn a good policy, can enable lay users to teach agents the behaviors they desire, and can allow agents to learn within a Markov Decision Process (MDP) in the absence of a coded reward function. However, TAMER does not allow this human training to be combined with autonomous learning based on such a coded reward function. This paper leverages the fast learning exhibited within the TAMER framework to hasten a reinforcement learning (RL) algorithm's climb up the learning curve, effectively demonstrating that human reinforcement and MDP reward can be used in conjunction with one another by an autonomous agent. We tested eight plausible TAMER+RL methods for combining a previously learned human reinforcement function, H, with MDP reward in a reinforcement learning algorithm. This paper identifies which of these methods are most effective and analyzes their strengths and weaknesses. Results from these TAMER+RL algorithms indicate better final performance and better cumulative performance than either a TAMER agent or an RL agent alone.

BibTeX Entry

@InProceedings{AAMAS10-knox,
  author="W.\ Bradley Knox and Peter Stone",
  title="Combining Manual Feedback with Subsequent {MDP} Reward Signals for Reinforcement Learning",
  booktitle="Proc. of 9th Int. Conf. on Autonomous Agents and Multiagent Systems (AAMAS 2010)",
  month="May",
  year="2010",
  abstract={As learning agents move from research labs to the real
	world, it is increasingly important that human users, including those
	without programming skills, be able to teach agents desired behaviors.
	Recently, the TAMER framework was introduced for designing agents that
	can be interactively shaped by human trainers who give only positive
	and negative feedback signals. Past work on TAMER showed that shaping
	can greatly reduce the sample complexity required to learn a good
	policy, can enable lay users to teach agents the behaviors they
	desire, and can allow agents to learn within a Markov Decision Process
	(MDP) in the absence of a coded reward function.  However, TAMER does
	not allow this human training to be combined with autonomous learning
	based on such a coded reward function.  This paper leverages the fast
	learning exhibited within the TAMER framework to hasten a
	reinforcement learning (RL) algorithm's climb up the learning curve,
	effectively demonstrating that human reinforcement and MDP reward can
	be used in conjunction with one another by an autonomous agent. We
	tested eight plausible TAMER+RL methods for combining a previously
	learned human reinforcement function, H, with MDP reward in a
	reinforcement learning algorithm. This paper identifies which of these
	methods are most effective and analyzes their strengths and
	weaknesses. Results from these TAMER+RL algorithms indicate better
	final performance and better cumulative performance than either a
	TAMER agent or an RL agent alone.  },
  wwwnote={Winner of the Pragnesh Jay Modi <b>BEST STUDENT PAPER AWARD</b> (and best paper award nominee).<br>The <a href="http://www.cs.utexas.edu/~bradknox/TAMER.html">TAMER</a> project page with <a href="http://www.cs.utexas.edu/~bradknox/TAMER_in_Action.html">videos</a> of TAMER in action.<br><a href="http://www.cse.yorku.ca/AAMAS2010//">AAMAS-2010</a>},
}

Generated by bib2html.pl (written by Patrick Riley ) on Wed Apr 16, 2025 20:42:41