Research

The SPQR team at Bremen 2006

My first Robocup, SPQR team, Bremen 2006.

The underwater view of the pool at CIRS in Girona

Arnau Carrera, me, and Nawid Jamali, at CIRS Girona, with the AUV Girona500

The underwater view of the pool at CIRS in Girona

Two student of my class, Matt B. and Robert L., developed a software interface to control an Air Drone with a Leap Motion device. They let children fly the drone with it.

My work is about the application of AI to robotics, in particular for what pertains decision making.

Automated reasoning and planning reached great maturity, and yet their use in robotics is difficult and generally limited. The inaccuracy of the models we come up with, trying to trade off abstraction, computational complexity, required knowledge, ease of implementation, and a number of other factors, has immediate consequences in the brittleness of the plans generated. Nonetheless, we still want to use those models to rationally guide our decisions. My main focus at the moment is on reconciling these two partially conflicting aspects.

In general, decision making is often performed in isolation, without really considering what should naturally follow and often doesn't: acting.

The attempt of mitigating the effects of uncertainty on knowledge representations brought me to work in Reinforcement Learning. During my PhD I packed my stuff, and traveled to Amherst, MA, where I have been a visiting student in Prof. Andy Barto's lab. The interesting work in decision making and robotics of Prof. Subramanian Ramamoorthy made me pack again, and move to Scotland. Once I got my PhD, the curiosity towards robots less cute than Nao brought me to Genova, where I worked on an Autonomous Underwater Vehicle, before landing in Austin, where I work on not really cute robots again, but this time in a dry place.

Automatic Generation of Hierarchies of Abstract Machines

An hierarchy of abstract machines (HAM), later developed into ALisp, is a hierarchical RL method to constrain the search space to what the designer considers reasonable. We developed the first algorithm (to the best of our knowledge...) to generate a HAM through planning from a domain description [11]. The HAM encodes all the minial-cost strong solution to a non-deterministic planning problem, and allows to learn the best one in practice.

All the optimal plans are equivalent to the planner, while due to the imperfection of the model they can in fact have very different outcomes in practice. We let the agent learn through the HAM what the best behavior actually is, while limiting the search to the reasonable (optimal in a certain model) plans. This limites the search space speeding up learning greatly, while on the other hand allows characterize and learn the more reliable and effective plans.

Policy Search and Stochastic Optimization

For the European project PANDORA, I worked on policy search and stochastic optimization. We developed an iterative, on-line, algorithm [15] to identify the parameters of a non-linear dynamic model, and we used it to learn the model of Girona500, an Autonomous Underwater Vehicles developed at CIRS, University of Girona, Spain. We also used this model to on-line plan (through policy search) trajectories executable in the event of a thruster failure [16,17], so that the AUV adapts to its new condition in a way similar to active fault-tolerant control.

Plan Representation and Learning

One of the first problems I considered came from my RoboCup experience. In the Standard Platform League, almost every team programmed the robots' behaviors by hand, and while some brave researcher tried to use automated planning, there was no single team performing learning on the robots at the level of tactic and strategic decisions (some learning used to happen in optimizing walking gates, and other low-level control tasks).

State machines are the formalism of choice for representing behaviors, so we extended a similar but more general one, Petri Net Plans, to allow partially specified plans with choice points [3,7]. The formalism allows to express parallel actions, interrupts, and sensing. We did not impose to the programmers constraints on the representation, with the consequence that it may turn out to be Non-Markovian. While eligibility traces alleviate the problem in most cases, sometimes TD methods just cannot learn anything. For those circumstances, we developed a global policy search algorithm, which relies on an estimate of the value function to shape the search, but whose convergence does not depend on it [8].

Publications