Machine Learning Research Group | University of Texas

Publications: Reinforcement Learning

Reinforcement Learning tasks are learning problems where the desired behavior is not known; only sparse feedback on how well the agent is doing is provided. Reinforcement Learning techniques include value-function and policy iteration methods (note that although evolutionary computation and neuroevolution can also be seen as reinforcement learning methods, they are presented separately in this area hierarchy).

Hide abstracts

Compositional Instruction Following with Language Models and Reinforcement Learning
[Details] [PDF]
Vanya Cohen, Geraud Nangue Tasse, Nakul Gopalan, Steven James, Matthew Gombolay, Raymond Mooney, Benjamin Rosman
In Transactions on Machine Learning Research, December 2024.
Combining reinforcement learning with language grounding is challenging as the agent needs to explore the environment while simultaneously learning multiple language-conditioned tasks. To address this, we introduce a novel method: the compositionally-enabled reinforcement learning language agent (CERLLA). Our method reduces the sample complexity of tasks specified with language by leveraging compositional policy representations and a semantic parser trained using reinforcement learning and in-context learning. We evaluate our approach in an environment requiring function approximation and demonstrate compositional generalization to novel tasks. Our method significantly outperforms the previous best non-compositional baseline in terms of sample complexity on 162 tasks designed to test compositional generalization. Our model attains a higher success rate and learns in fewer steps than the non-compositional baseline. It reaches a success rate equal to an oracle policy’s upper-bound performance of 92 percent. With the same number of environment steps, the baseline only reaches a success rate of 80 percent.
ML ID: 436
Using Natural Language to Aid Task Specification in Sequential Decision Making Problems
[Details] [PDF] [Slides (PDF)] [Video]
Prasoon Goyal
October 2021. Ph.D. Proposal.
Intelligent agents that can help humans accomplish everyday tasks, such as a personal robot at home or a robot in a work environment, is a long-standing goal of artificial intelligence. One of the requirements for such general-purpose agents is the ability to teach them new tasks or skills relatively easily. Common approaches to teaching agents new skills include reinforcement learning (RL) and imitation learning (IL). However, specifying the task to the learning agent, i.e. designing effective reward functions for reinforcement learning and providing demonstrations for imitation learning, are often cumbersome and time-consuming. We aim to use natural language as an auxiliary signal to aid task specification, which reduces the burden on the end user. To make reward design easier, we propose a novel framework that is used to generate language-based rewards in addition to the extrinsic rewards from the environment for faster policy training using RL. To ameliorate the problem of providing demonstrations, we propose a new setting that enables an agent to learn a new task without demonstrations in an IL setting, given a demonstration from a related task and a natural language description of the difference between the desired task and the demonstrated task. The primary contributions of this dissertation will be new frameworks that enable incorporating natural language in RL and IL, which would enable non-expert users to specify new tasks to intelligent agents more conveniently.
ML ID: 398
Supervised Attention from Natural Language Feedback for Reinforcement Learning
[Details] [PDF]
Clara Cecilia Cannon
Masters Thesis, Department of Computer Science, The University of Texas at Austin, May 2021.
In this paper, we introduce a new approach to Reinforcement Learning (RL) called “supervised attention” from human feedback which focuses on novel task learning from human interaction on relevant features of the environment, which we hypothesize will allow for effective learning from limited training data. We wanted to answer the following question: does the addition of language to existing RL frameworks improve agent learning? We wanted to show that language helps the agent pick out the most important features in its perception. We tested many methods for implementing this concept and settled on incorporating language feedback via a template matching scheme. While more sophisticated techniques, such as attention, would be better at grounding the language, we discovered this task is non-trivial for our choice of environment. Using deep learning methods, we translate human linguistic narration to a saliency map over the perceptual field. This saliency map is used to inform a deep-reinforcement learning system which features in the visual observation are most important relative to its position in the environment and optimize task learning. We establish a baseline model using deep TAMER and test our framework on Montezuma’s Revenge, the most difficult game in theAtari Arcade suite. However, our final framework demonstrates the incompatibility of language in the Atari suite in a supervised attention setting. The ultimate result showed that as long as the agent’s position in the observation was clear, the model ignores surrounding contextual information, regardless of potential benefit. We conclude that the Atari network of games is unsuitable for grounding natural language in high-dimensional state spaces. Further development of sophisticated simulations is required.
ML ID: 396
Dialog Policy Learning for Joint Clarification and Active Learning Queries
[Details] [PDF] [Slides (PDF)] [Poster] [Video]
Aishwarya Padmakumar, Raymond J. Mooney
In The AAAI Conference on Artificial Intelligence (AAAI), February 2021.
Intelligent systems need to be able to recover from mistakes, resolve uncertainty, and adapt to novel concepts not seen during training. Dialog interaction can enable this by the use of clarifications for correction and resolving uncertainty, and active learning queries to learn new concepts encountered during operation. Prior work on dialog systems has either focused on exclusively learning how to perform clarification/ information seeking, or to perform active learning. In this work, we train a hierarchical dialog policy to jointly perform both clarification and active learning in the context of an interactive language-based image retrieval task motivated by an on-line shopping application, and demonstrate that jointly learning dialog policies for clarification and active learning is more effective than the use of static dialog policies for one or both of these functions.
ML ID: 385
Dialog as a Vehicle for Lifelong Learning of Grounded Language Understanding Systems
[Details] [PDF] [Slides (PDF)]
Aishwarya Padmakumar
PhD Thesis, Department of Computer Science, The University of Texas at Austin, August 2020.
Natural language interfaces have the potential to make various forms of technology, including mobile phones and computers as well as robots or other machines such as ATMs and self-checkout counters, more accessible and less intimidating to users who are unfamiliar or uncomfortable with other types of interfaces. In particular, natural language understanding systems on physical robots face a number of challenges, including the need to ground language in perception, the ability to adapt to changes in the environment and novel uses of language, and to deal with uncertainty in understanding. To effectively handle these challenges, such systems need to perform lifelong learning - continually updating the scope and predictions of the model with user interactions. In this thesis, we discuss ways in which dialog interaction with users can be used to improve grounded natural language understanding systems, motivated by service robot applications. We focus on two types of queries that can be used in such dialog systems – active learning queries to elicit knowledge about the environment that can be used to improve perceptual models, and clarification questions that confirm the system’s hypotheses, or elicit specific information required to complete a task. Our goal is to build a system that can learn how to interact with users balancing a quick completion of tasks desired by the user with asking additional active learning questions to improve the underlying grounded language understanding components. We present work on jointly improving semantic parsers from and learning a dialog policy for clarification dialogs, that improve a robot’s ability to understand natural language commands. We introduce the framework of opportunistic active learning, where a robot introduces opportunistic queries, that may not be immediately relevant, into an interaction in the hope of improving performance in future interactions. We demonstrate the usefulness of this framework in learning to ground natural language descriptions of objects, and learn a dialog policy for such interactions. We also learn dialog policies that balance task completion, opportunistic active learning, and attribute-based clarification questions. Finally, we attempt to expand this framework to different types of underlying models of grounded language understanding.
ML ID: 389
PixL2R: Guiding Reinforcement Learning using Natural Language by Mapping Pixels to Rewards
[Details] [PDF]
Prasoon Goyal, Scott Niekum, Raymond J. Mooney
In 4th Conference on Robot Learning (CoRL), November 2020. Also presented on the 1st Language in Reinforcement Learning (LaReL) Workshop at ICML, July 2020 (Best Paper Award), the 6th Deep Reinforcement Learning Workshop at Neural Information Processing Systems (NeurIPS), Dec 2020.
Reinforcement learning (RL), particularly in sparse reward settings, often requires prohibitively large numbers of interactions with the environment, thereby limiting its applicability to complex problems. To address this, several prior approaches have used natural language to guide the agent's exploration. However, these approaches typically operate on structured representations of the environment, and/or assume some structure in the natural language commands. In this work, we propose a model that directly maps pixels to rewards, given a free-form natural language description of the task, which can then be used for policy training. Our experiments on the Meta-World robot manipulation domain show that language-based rewards significantly improve learning. Further, we analyze the resulting framework using multiple ablation experiments to better understand the nature of these improvements.
ML ID: 388
Evaluating the Robustness of Natural Language Reward Shaping Models to Spatial Relations
[Details] [PDF] [Slides (PPT)] [Slides (PDF)]
Antony Yun
May 2020. Undergraduate Honors Thesis, Computer Science Department, University of Texas at Austin.
As part of an effort to bridge the gap between using reinforcement learning in simulation and in the real world, we probe whether current reward shaping models are able to encode relational data between objects in the environment. We construct an augmented dataset for controlling a robotic arm in the Meta-World platform to test whether current models are able to discriminate between target objects based on their relations. We found that state of the art models are indeed expressive enough to achieve performance comparable to the gold standard, so this specific experiment did not uncover any obvious shortcomings.
ML ID: 384
Using Natural Language for Reward Shaping in Reinforcement Learning
[Details] [PDF] [Slides (PDF)] [Poster]
Prasoon Goyal, Scott Niekum, Raymond J. Mooney
In Proceedings of the 28th International Joint Conference on Artificial Intelligence, Macao, China, August 2019.
Recent reinforcement learning (RL) approaches have shown strong performance in complex domains such as Atari games, but are often highly sample inefficient. A common approach to reduce interaction time with the environment is to use reward shaping, which involves carefully designing reward functions that provide the agent intermediate rewards for progress towards the goal. However, designing appropriate shaping rewards is known to be difficult as well as time-consuming. In this work, we address this problem by using natural language instructions to perform reward shaping. We propose the LanguagE-Action Reward Network (LEARN), a framework that maps free-form natural language instructions to intermediate rewards based on actions taken by the agent. These intermediate language-based rewards can seamlessly be integrated into any standard reinforcement learning algorithm. We experiment with Montezuma’s Revenge from the Atari Learning Environment, a popular benchmark in RL. Our experiments on a diverse set of 15 tasks demonstrate that, for the same number of interactions with the environment, language-based rewards lead to successful completion of the task 60 % more often on average, compared to learning without language.
ML ID: 376
Learning a Policy for Opportunistic Active Learning
[Details] [PDF]
Aishwarya Padmakumar, Peter Stone, Raymond J. Mooney
In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP-18), Brussels, Belgium, November 2018.
Active learning identifies data points to label that are expected to be the most useful in improving a supervised model. Opportunistic active learning incorporates active learning into interactive tasks that constrain possible queries during interactions. Prior work has shown that opportunistic active learning can be used to improve grounding of natural language descriptions in an interactive object retrieval task. In this work, we use reinforcement learning for such an object retrieval task, to learn a policy that effectively trades off task completion with model improvement that would benefit future tasks.
ML ID: 368
Integrated Learning of Dialog Strategies and Semantic Parsing
[Details] [PDF]
Aishwarya Padmakumar and Jesse Thomason and Raymond J. Mooney
In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2017), 547--557, Valencia, Spain, April 2017.
Natural language understanding and dialog management are two integral components of interactive dialog systems. Previous research has used machine learning techniques to individually optimize these components, with different forms of direct and indirect supervision. We present an approach to integrate the learning of both a dialog strategy using reinforcement learning, and a semantic parser for robust natural language understanding, using only natural dialog interaction for supervision. Experimental results on a simulated task of robot instruction demonstrate that joint learning of both components improves dialog performance over learning either of these components alone.
ML ID: 342
Using Active Relocation to Aid Reinforcement Learning
[Details] [PDF]
Lilyana Mihalkova and Raymond Mooney
In Prodeedings of the 19th International FLAIRS Conference (FLAIRS-2006), 580-585, Melbourne Beach, FL, May 2006.
We propose a new framework for aiding a reinforcement learner by allowing it to relocate, or move, to a state it selects so as to decrease the number of steps it needs to take in order to develop an effective policy. The framework requires a minimal amount of human involvement or expertise and assumes a cost for each relocation. Several methods for taking advantage of the ability to relocate are proposed, and their effectiveness is tested in two commonly-used domains.
ML ID: 166
Guiding a Reinforcement Learner with Natural Language Advice: Initial Results in RoboCup Soccer
[Details] [PDF]
Gregory Kuhlmann, Peter Stone, Raymond J. Mooney, and Jude W. Shavlik
In The AAAI-2004 Workshop on Supervisory Control of Learning and Adaptive Systems, July 2004.
We describe our current efforts towards creating a reinforcement learner that learns both from reinforcements provided by its environment and from human-generated advice. Our research involves two complementary components: (a) mapping advice expressed in English to a formal advice language and (b) using advice expressed in a formal notation in a reinforcement learner. We use a subtask of the challenging RoboCup simulated soccer task as our testbed.
ML ID: 151