Peter Stone's Selected Publications

Classified by TopicClassified by Publication TypeSorted by DateSorted by First Author Last NameClassified by Funding Source


Reducing Sampling Error in Policy Gradient Learning

Reducing Sampling Error in Policy Gradient Learning.
Josiah Hanna and Peter Stone.
In Proceedings of the 18th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), May 2019.
This paper contains material that was previously presented at the 2018 NeurIPS Deep Reinforcement Learning Workshop.

Download

[PDF]1.5MB  [slides.pdf]3.1MB  

Abstract

This paper studies a class of reinforcement learning algorithms known as policy gradient methods. Policy gradient methods optimize the performance of a policy by estimating the gradient of the expected return with respect to the policy parameters. One of the core challenges of applying policy gradient methods is obtaining an accurate estimate of this gradient. Most policy gradient methods rely on Monte Carlo sampling to estimate this gradient. When only a limited number of environment steps can be collected, Monte Carlo policy gradient estimates may suffer from sampling error -- samples receive more or less weight than they will in expectation. In this paper, we introduce the Sampling Error Corrected policy gradient estimator that corrects the inaccurate Monte Carlo weights. Our approach treats the observed data as if it were generated by a different policy than the policy that actually generated the data. It then uses importance sampling between the two -- in the process correcting the inaccurate Monte Carlo weights. Under a limiting set of assumptions we can show that this gradient estimator will have lower variance than the Monte Carlo gradient estimator. We show experimentally that our approach improves the learning speed of two policy gradient methods compared to standard Monte Carlo sampling even when the theoretical assumptions fail to hold.

BibTeX Entry

@InProceedings{AAMAS19-Hanna,
  author = {Josiah Hanna and Peter Stone},
  title = {Reducing Sampling Error in Policy Gradient Learning},
  booktitle = {Proceedings of the 18th International Conference on Autonomous Agents and Multiagent Systems (AAMAS)},
  location = {Montreal, Canada},
  month = {May},
  year = {2019},
  abstract = {
This paper studies a class of reinforcement learning algorithms known as policy 
gradient methods. Policy gradient methods optimize the performance of a policy 
by estimating the gradient of the expected return with respect to the policy 
parameters. One of the core challenges of applying policy gradient methods is 
obtaining an accurate estimate of this gradient. Most policy gradient methods 
rely on Monte Carlo sampling to estimate this gradient. When only a limited 
number of environment steps can be collected, Monte Carlo policy gradient 
estimates may suffer from sampling error -- samples receive more or less weight 
than they will in expectation. In this paper, we introduce the Sampling Error 
Corrected policy gradient estimator that corrects the inaccurate Monte Carlo 
weights. Our approach treats the observed data as if it were generated by a 
different policy than the policy that actually generated the data. It then uses 
importance sampling between the two -- in the process correcting the inaccurate 
Monte Carlo weights. Under a limiting set of assumptions we can show that this 
gradient estimator will have lower variance than the Monte Carlo gradient 
estimator. We show experimentally that our approach improves the learning speed 
of two policy gradient methods compared to standard Monte Carlo sampling even 
when the theoretical assumptions fail to hold.
  },
  wwwnote={This paper contains material that was previously presented at the 2018 NeurIPS Deep Reinforcement Learning Workshop.}
}

Generated by bib2html.pl (written by Patrick Riley ) on Tue Nov 19, 2024 10:24:43