• Classified by Topic • Classified by Publication Type • Sorted by Date • Sorted by First Author Last Name • Classified by Funding Source •
Learning and Using Models.
Todd Hester and Peter
Stone.
In Marco Wiering and Martijn van Otterlo, editors, Reinforcement Learning: State of the Art, Springer
Verlag, Berlin, Germany, 2011.
[PDF]474.7kB [postscript]1.0MB
As opposed to model-free RL methods, which learn directly from experience in the domain, model-based methods learn a model of the transition and reward functions of the domain on-line and plan a policy using this model.Once the method has learned an accurate model, it can plan an optimal policy on this model without any further experience in the world.Therefore, when model-based methods are able to learn a good model quickly,they frequently have improved sample efficiency over model-free methods, which must continue taking actions in the world for values to propagate back to previous states. Another advantage of model-based methods is that they can usetheir models to plan multi-step exploration trajectories. In particular,many methods drive the agent to explore where there is uncertainty in the model,so as to learn the model as fast as possible.In this chapter, we survey some of the types of models used in model-based methods and ways of learningthem, as well as methods for planning on these models.In addition, we examine the typical architectures forcombining model learning and planning, which vary depending on whether thedesigner wants the algorithm to run on-line, in batch mode, or inreal-time. One of the main performance criteria for these algorithmsis sample complexity, or how many actions thealgorithm must take to learn. We examine the sample efficiency of a few methods,which are highly dependent on having intelligent exploration mechanisms. We survey some approaches to solving the exploration problem, including Bayesianmethods that maintain a belief distribution over possible models to explicitlymeasure uncertainty in the model. We show some empirical comparisons of various model-based and model-free methods on two example domains before concluding with a survey of currentresearch on scaling these methods up to larger domains withimproved sample and computational complexity.
@inCollection{RLSOTA11, author = {Todd Hester and Peter Stone}, title = {Learning and Using Models}, booktitle = {Reinforcement Learning: State of the Art}, editor = {Marco Wiering and Martijn van Otterlo}, year = {2011}, address = {Berlin, Germany}, publisher = {Springer Verlag}, abstract = "As opposed to model-free RL methods, which learn directly from experience in the domain, model-based methods learn a model of the transition and reward functions of the domain on-line and plan a policy using this model. Once the method has learned an accurate model, it can plan an optimal policy on this model without any further experience in the world. Therefore, when model-based methods are able to learn a good model quickly, they frequently have improved sample efficiency over model-free methods, which must continue taking actions in the world for values to propagate back to previous states. Another advantage of model-based methods is that they can use their models to plan multi-step exploration trajectories. In particular, many methods drive the agent to explore where there is uncertainty in the model, so as to learn the model as fast as possible. In this chapter, we survey some of the types of models used in model-based methods and ways of learning them, as well as methods for planning on these models. In addition, we examine the typical architectures for combining model learning and planning, which vary depending on whether the designer wants the algorithm to run on-line, in batch mode, or in real-time. One of the main performance criteria for these algorithms is sample complexity, or how many actions the algorithm must take to learn. We examine the sample efficiency of a few methods, which are highly dependent on having intelligent exploration mechanisms. We survey some approaches to solving the exploration problem, including Bayesian methods that maintain a belief distribution over possible models to explicitly measure uncertainty in the model. We show some empirical comparisons of various model-based and model-free methods on two example domains before concluding with a survey of current research on scaling these methods up to larger domains with improved sample and computational complexity.", }
Generated by bib2html.pl (written by Patrick Riley ) on Tue Nov 19, 2024 10:24:40