Peter Stone's Selected Publications

Classified by TopicClassified by Publication TypeSorted by DateSorted by First Author Last NameClassified by Funding Source


Multistep Inverse Is Not All You Need

Multistep Inverse Is Not All You Need.
Alexander Levine, Peter Stone, and Amy Zhang.
Reinforcement Learning Journal, 2024.

Download

[PDF]1.8MB  [slides.pdf]4.8MB  [poster.pdf]2.7MB  

Abstract

In real-world control settings, the observation space is often unnecessarilyhigh-dimensional and subject to time-correlated noise. However, the controllabledynamics of the system are often far simpler than the dynamics of the rawobservations. It is therefore desirable to learn an encoder to map theobservation space to a simpler space of control-relevant variables. In this work,we consider the Ex-BMDP model, first proposed by Efroni et al. (2022), whichformalizes control problems where observations can be factorized into anaction-dependent latent state which evolves deterministically, andaction-independent time-correlated noise. Lamb et al. (2022) proposes the"AC-State" method for learning an encoder to extract a complete action-dependentlatent state representation from the observations in such problems. AC-State is amultistep-inverse method, in that it uses the encoding of the the first and laststate in a path to predict the first action in the path. However, we identifycases where AC-State will fail to learn a correct latent representation of theagent-controllable factor of the state. We therefore propose a new algorithm,ACDF, which combines multistep-inverse prediction with a latent forward model.ACDF is guaranteed to correctly infer an action-dependent latent state encoderfor a large class of Ex-BMDP models. We demonstrate the effectiveness of ACDF ontabular Ex-BMDPs through numerical simulations; as well as high-dimensionalenvironments using neural-network-based encoders. Code is available athttps://github.com/midi-lab/acdf.

BibTeX Entry

@Article{alexander_levine_RLC_2024,
  author   = {Alexander Levine and Peter Stone and Amy Zhang},
  title    = {Multistep Inverse Is Not All You Need},
  journal = {Reinforcement Learning Journal},
  year     = {2024},
  abstract = {In real-world control settings, the observation space is often unnecessarily
high-dimensional and subject to time-correlated noise. However, the controllable
dynamics of the system are often far simpler than the dynamics of the raw
observations. It is therefore desirable to learn an encoder to map the
observation space to a simpler space of control-relevant variables. In this work,
we consider the Ex-BMDP model, first proposed by Efroni et al. (2022), which
formalizes control problems where observations can be factorized into an
action-dependent latent state which evolves deterministically, and
action-independent time-correlated noise. Lamb et al. (2022) proposes the
"AC-State" method for learning an encoder to extract a complete action-dependent
latent state representation from the observations in such problems. AC-State is a
multistep-inverse method, in that it uses the encoding of the the first and last
state in a path to predict the first action in the path. However, we identify
cases where AC-State will fail to learn a correct latent representation of the
agent-controllable factor of the state. We therefore propose a new algorithm,
ACDF, which combines multistep-inverse prediction with a latent forward model.
ACDF is guaranteed to correctly infer an action-dependent latent state encoder
for a large class of Ex-BMDP models. We demonstrate the effectiveness of ACDF on
tabular Ex-BMDPs through numerical simulations; as well as high-dimensional
environments using neural-network-based encoders. Code is available at
https://github.com/midi-lab/acdf.
  },
}

Generated by bib2html.pl (written by Patrick Riley ) on Wed Oct 16, 2024 19:53:36