My work is about the application of AI to robotics, in particular for what pertains decision making.
Automated reasoning and planning reached great maturity, and yet their use in robotics is difficult and generally limited. The inaccuracy of the models we come up with, trying to trade off abstraction, computational complexity, required knowledge, ease of implementation, and a number of other factors, has immediate consequences in the brittleness of the plans generated. Nonetheless, we still want to use those models to rationally guide our decisions. My main focus at the moment is on reconciling these two partially conflicting aspects.
In general, decision making is often performed in isolation, without really considering what should naturally follow and often doesn't: acting.
The attempt of mitigating the effects of uncertainty on knowledge representations brought me to work in Reinforcement Learning. During my PhD I packed my stuff, and traveled to Amherst, MA, where I have been a visiting student in Prof. Andy Barto's lab. The interesting work in decision making and robotics of Prof. Subramanian Ramamoorthy made me pack again, and move to Scotland. Once I got my PhD, the curiosity towards robots less cute than Nao brought me to Genova, where I worked on an Autonomous Underwater Vehicle, before landing in Austin, where I work on not really cute robots again, but this time in a dry place.
Automatic Generation of Hierarchies of Abstract Machines
An hierarchy of abstract machines (HAM), later developed into ALisp, is a hierarchical RL method to constrain the search space to what the designer considers reasonable. We developed the first algorithm (to the best of our knowledge...) to generate a HAM through planning from a domain description [11]. The HAM encodes all the minial-cost strong solution to a non-deterministic planning problem, and allows to learn the best one in practice.
All the optimal plans are equivalent to the planner, while due to the imperfection of the model they can in fact have very different outcomes in practice. We let the agent learn through the HAM what the best behavior actually is, while limiting the search to the reasonable (optimal in a certain model) plans. This limites the search space speeding up learning greatly, while on the other hand allows characterize and learn the more reliable and effective plans.
Policy Search and Stochastic Optimization
For the European project PANDORA, I worked on policy search and stochastic optimization. We developed an iterative, on-line, algorithm [15] to identify the parameters of a non-linear dynamic model, and we used it to learn the model of Girona500, an Autonomous Underwater Vehicles developed at CIRS, University of Girona, Spain. We also used this model to on-line plan (through policy search) trajectories executable in the event of a thruster failure [16,17], so that the AUV adapts to its new condition in a way similar to active fault-tolerant control.
Plan Representation and Learning
One of the first problems I considered came from my RoboCup experience. In the Standard Platform League, almost every team programmed the robots' behaviors by hand, and while some brave researcher tried to use automated planning, there was no single team performing learning on the robots at the level of tactic and strategic decisions (some learning used to happen in optimizing walking gates, and other low-level control tasks).
State machines are the formalism of choice for representing behaviors, so we extended a similar but more general one, Petri Net Plans, to allow partially specified plans with choice points [3,7]. The formalism allows to express parallel actions, interrupts, and sensing. We did not impose to the programmers constraints on the representation, with the consequence that it may turn out to be Non-Markovian. While eligibility traces alleviate the problem in most cases, sometimes TD methods just cannot learn anything. For those circumstances, we developed a global policy search algorithm, which relies on an estimate of the value function to shape the search, but whose convergence does not depend on it [8].
Publications
- [19] P. Khandelwal, F. Yang, M. Leonetti, V. Lifschitz, P. Stone. Planning in Action Language BC while Learning Action Costs for Mobile Robots. Proc. of 24th International Conference on Automated Planning and Scheduling (ICAPS), 2014.
- [18] F. Yang, P. Khandelwal, M. Leonetti, and P. Stone. Planning in Answer Set Programming while Learning Action Costs for Mobile Robots. Proc. of the AAAI 2013 Spring Symposium on Knowledge Representation in Robotics, 2014.
- [17] Seyed Reza Ahmadzadeh, Matteo Leonetti, Arnau Carrera, Marc Carreras, Petar Kormushev, Darwin G Caldwell. Online Discovery of AUV Control Policies to Overcome Thruster Failures. Proc. of IEEE Intl Conf. on Robotics and Automation (ICRA), 2014.
- [16] M.Leonetti, S. R. Ahmadzadeh, P. Kormushev. On-line Learning to Recover from Thruster Failures on Autonomous Underwater Vehicles. Proceedings of OCEANS'13, 2013.
- [15] G.Karras, C. Bechlioulis, M. Leonetti, P. Kormushev, N. Palomeras, and K. Kyriakopoulos, and D. G. Caldwell. On-line Identification of Autonomous Underwater Vehicles Through Global Derivative-free Optimization. Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Tokyo, Japan, 2013
- [14] S. R.Ahmadzadeh, M. Leonetti, P. Kormushev. Online Direct Policy Search for Thruster Failure Recovery in Autonomous Underwater Vehicles. Proceedings of the 6th International Workshop on Evolutionary and Reinforcement Learning for Autonomous Robot Systems, 2013
- [13] G. Beninati, M. Leonetti, N. G. Tsagarakis, D. G. Caldwell, A Methodology to Characterize Critical Falling Configurations for a Humanoid Robot. Proceedings of Eccomas Thematic Conference: Multibody Dynamics, 2013
- [12] M. Leonetti, P. Kormushev, S. Sagratella. Combining Local and Global Direct Derivative-freeOptimization for Reinforcement Learning. International Journal of Cybernetics and Information Technologies, Vol. 12, No. 3, pp. 53-65, 2012
- [11] M. Leonetti, L. Iocchi, F. Patrizi. Automatic Generation and Learning of Finite-State Controllers. Proc. of 15th Int. Conf. on Artificial Intelligence: Methodology, Systems, Applications (AIMSA), 2012. (nominated for best paper award)
- [10] M. Leonetti, L. Iocchi, S. Ramamoorthy. Induction and Learning of Finite-State Controllers from Simulation (Extended Abstract). Proc. of 11th Int. Conf. on Autonomous Agents and Multiagent Systems (AAMAS), 2012.
- [9] P. Romano, M. Leonetti. Self-tuning Batching in Total Order Broadcast Protocols via Analytical Modelling and Reinforcement Learning. Proc. of the International Conference on Computing, Networking and Communications, Maui, Hawaii, USA. 2012.
- [8] M. Leonetti, L. Iocchi, S. Ramamoorthy. Reinforcement Learning Through Global Stochastic Search in N-MDPs. Proc. of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD), Athens, Greece 2011.
- [7] M. Leonetti. Reinforcement Learning in Plan Space. PhD Thesis, Sapienza University of Rome. Department of Computer and System Science. Rome, Italy. 2011.
- [6] V.A. Ziparo, L. Iocchi, M. Leonetti, D. Nardi. A Probabilistic Action Duration Model for Plan Selection and Monitoring. Proc. of the International Conference on Intelligent Robots and Systems (IROS) 2010.
- [5] M. Leonetti, S. Ramamoorthy. A Heuristic Strategy for Learning in Partially Observable and Non-Markovian Domains. Proc. of the 3rd International Workshop on Evolutionary and Reinforcement Learning for Autonomous Robot Systems (ERLARS), 2010.
- [4] M. Leonetti, L. Iocchi. Improving the Performance of Complex Agent Plans Through Reinforcement Learning. Proc. of 9th Int. Conf. on Autonomous Agents and Multiagent Systems (AAMAS 2010), van der Hoek, Kaminka, Lespérance, Luck and Sen (eds.), Toronto, Canadam May, 10-14, 2010.
- [3] M. Leonetti, L. Iocchi. LearnPNP: A Tool for Learning Agent Behaviors. Proceedings of the RoboCup International Symposium 2010, Springer Verlag, 2010
- [2] V.A. Ziparo, L. Iocchi, M. Leonetti, D. Nardi. On-line robot execution monitoring using probabilistic action duration (Extended Abstract). Proc. of 9th Int. Conf. on Autonomous Agents and Multiagent Systems (AAMAS 2010), van der Hoek, Kaminka, Lesperance, Luck and Sen (eds.), Toronto, Canada, May, 10-14, 2010.
- [1] L. Iocchi, M. Leonetti, D. Nardi, V. A. Ziparo. Representing and Embedding fuzzy controllers in Petri Net Plans. Workshop AI*IA Verso la Robotica Intenzionale , 2009.