Peter Stone's Selected Publications

• Classified by Topic • Classified by Publication Type • Sorted by Date • Sorted by First Author Last Name • Classified by Funding Source •

On-Policy vs. Off-Policy Updates for Deep Reinforcement Learning

On-Policy vs. Off-Policy Updates for Deep Reinforcement Learning.
Matthew Hausknecht and Peter Stone.
In Deep Reinforcement Learning: Frontiers and Challenges, IJCAI Workshop, July 2016.

Download

[PDF]2.5MB

Abstract

Temporal-difference-based deep-reinforcement learning methods havetypically been driven by off-policy, bootstrap Q-Learning updates. Inthis paper, we investigate the effects of using on-policy, Monte Carloupdates. Our empirical results show that for the DDPG algorithm in acontinuous action space, mixing on-policy and off-policy updatetargets exhibits superior performance and stability compared to usingexclusively one or the other. The same technique applied to DQN in adiscrete action space drastically slows down learning. Our findingsraise questions about the nature of on-policy and off-policy bootstrapand Monte Carlo updates and their relationship to deep reinforcementlearning methods.

BibTeX Entry

@InProceedings{DeepRL16-hausknecht,
  author = {Matthew Hausknecht and Peter Stone},
  title = {On-Policy vs. Off-Policy Updates for Deep Reinforcement Learning},
  booktitle = {Deep Reinforcement Learning: Frontiers and Challenges, IJCAI Workshop},
  location = {New York},
  month = {July},
  year = {2016},
  abstract = {
Temporal-difference-based deep-reinforcement learning methods have
typically been driven by off-policy, bootstrap Q-Learning updates. In
this paper, we investigate the effects of using on-policy, Monte Carlo
updates. Our empirical results show that for the DDPG algorithm in a
continuous action space, mixing on-policy and off-policy update
targets exhibits superior performance and stability compared to using
exclusively one or the other. The same technique applied to DQN in a
discrete action space drastically slows down learning. Our findings
raise questions about the nature of on-policy and off-policy bootstrap
and Monte Carlo updates and their relationship to deep reinforcement
learning methods.
  },
}

Generated by bib2html.pl (written by Patrick Riley ) on Fri Apr 17, 2026 17:16:27