• Classified by Topic • Classified by Publication Type • Sorted by Date • Sorted by First Author Last Name • Classified by Funding Source •
Charles Lee Isbell, Jr.,
Christian R. Shelton, Michael
Kearns, Satinder Singh, and Peter
Stone. A Social Reinforcement Learning Agent. In Proceedings of the Fifth International Conference on Autonomous
Agents, pp. 377–384, ACM Press, Montreal, Canada, 2001.
BEST PAPER AWARD at Agents-2001
[PDF]243.5kB [postscript]1.7MB
We report on our reinforcement learning work on Cobot, a software agent that resides in the well-known online chat community LambdaMOO. Our initial work on Cobot provided him with the ability to collect social statistics report them to users in a reactive manner. Here we describe our application of reinforcement learning to allow Cobot to proactively take actions in this complex social environment, and adapt his behavior from multiple sources of human reward. After 5 months of training, Cobot has received 3171 reward and punishment events from 254 different LambdaMOO users, and has learned nontrivial preferences for a number of users. Cobot modifies his behavior based on his current state in an attempt to maximize reward. Here we describe LambdaMOO and the state and action spaces of Cobot, and report the statistical results of the learning experiment.
@InProceedings(AA01-cobot, Author="Charles Lee Isbell and Jr. and Christian R. Shelton and Michael Kearns and Satinder Singh and Peter Stone", Title="A Social Reinforcement Learning Agent", BookTitle="Proceedings of the Fifth International Conference on Autonomous Agents", Year="2001", publisher = "ACM Press", address = "Montreal, Canada", editor = {J{\"o}rg P. M{\"u}ller and Elisabeth Andre and Sandip Sen and Claude Frasson}, pages = "377--384", abstract={ We report on our reinforcement learning work on Cobot, a software agent that resides in the well-known online chat community LambdaMOO. Our initial work on Cobot provided him with the ability to collect social statistics report them to users in a reactive manner. Here we describe our application of reinforcement learning to allow Cobot to proactively take actions in this complex social environment, and adapt his behavior from multiple sources of human reward. After 5 months of training, Cobot has received 3171 reward and punishment events from 254 different LambdaMOO users, and has learned nontrivial preferences for a number of users. Cobot modifies his behavior based on his current state in an attempt to maximize reward. Here we describe LambdaMOO and the state and action spaces of Cobot, and report the statistical results of the learning experiment. }, wwwnote={<b>BEST PAPER AWARD at</b> <a href="http://www.csc.liv.ac.uk/~agents2001/">Agents-2001</a>}, )
Generated by bib2html.pl (written by Patrick Riley ) on Wed Apr 19, 2006 17:23:13