Imitation Learning from Observation

Faraz Torabi

Abstract

Advances in robotics have resulted in increases both in the availability of robots and also their complexity---a situation that necessitates automating both the execution and acquisition of robot behaviors. For this purpose, multiple machine learning frameworks have been proposed, including reinforcement learning and imitation learning. Imitation learning in particular has the advantage of not requiring a human engineer to attempt the difficult process of cost function design necessary in reinforcement learning. Moreover, compared to reinforcement learning, imitation learning typically requires less exploration time before an acceptable behavior is learned. These advantages exist because, in the framework of imitation learning, a learning agent has access to an expert agent that demonstrates how a task should be performed. Broadly speaking, this framework has a limiting constraint in that it requires the learner to have access not only to the states (e.g., observable quantities such as spatial location) of the expert, but also to its actions (e.g., internal control signals such as motor commands). This constraint is limiting in the sense that it prevents the agent from taking advantage of potentially rich demonstration resources that do not contain action information, e.g., YouTube videos. To alleviate this restriction, it Imitation Learning from Observation (IfO) has recently been introduced as an imitation learning framework that explicitly seeks to learn behaviors by observing state-only expert demonstrations.

The IfO problem has two main components: (1) perception of the demonstrations, and (2) learning a control policy. This thesis focuses primarily on the second component, and introduces multiple algorithms to solve the control aspect of the problem. Each of the proposed algorithms has certain advantages and disadvantages over the others in terms of performance, stability and sample complexity. Moreover, some of the algorithms are model-based (i.e., a model of the dynamics of the environment is learned in the imitation learning process), and some are model-free. In general, model-based algorithms are more sample-efficient, whereas model-free algorithms are known for their performance. Though the focus of this thesis is on the control aspect of IfO, two algorithms are introduced that do integrate a perception module into one of the control algorithms. By doing so, the adaptability of that control algorithm to the general IfO problem is shown.

Downloads

Full Dissertation
Slides (pptx)
Slides (pdf)