Learning to Walk
To learn to walk faster, the Aibos evaluated different gaits by walking back and forth across the field between pairs of beacons, timing how long each lap took. The learning was all done on the physical robots with no human intervention (other than to change the batteries). To speed up the process, we had three Aibos working simultaneously, dividing up the search space accordingly.
Initially, the Aibo's gait is clumsy and fairly slow (less than 150 mm/s). We deliberately started with a poor gait so that the learning process would not be systematically biased towards our best hand-tuned gait, which might have been locally optimal.
Midway through the training process, the Aibo is moving much faster than it was initially. However, it still exhibits some irregularities that slow it down.
After traversing the field a total of just over 1000 times over the course of 3 hours, we achieved our best learned gait, which allows the Aibo to move at approximately 291 mm/s. To our knowledge, this is the fastest reported walk on an Aibo as of November 2003. The hash marks on the field are 200 mm apart. The Aibo traverses 9 of them in 6.13 seconds demonstrating a speed of 1800mm/6.13s > 291 mm/s.
Achieving a fast walk is an essential part of being a competitive team. At competitions, if needed, we re-train our walk for the specific playing surface of the venue. Because the robots train with minimal human involvement, team members can work on other things in that time.
A fast gait is an essential component of any successful team in the RoboCup 4-legged league. However, quickly moving quadruped robots, including those with learned gaits, often move in such a way so as to cause unsteady camera motions which degrade the robot's visual capabilities. One direction that we have continued this research in is by searching for a parameterized walk while optimizing for both speed and stability.
To the best of our knowledge, previous learned walks have all focused exclusively on speed. Our method is fully implemented and tested on the Sony Aibo ERS-7 robot platform. The resulting gait is reasonably fast and considerably more stable compared to our previous fast gaits. We demonstrate that this stability can significantly improve the robot's visual object recognition.
Full details of our approach are available in the following papers:
-
Policy Gradient Reinforcement Learning for Fast Quadrupedal
Locomotion
Nate Kohl and Peter Stone.
Proceedings of the IEEE International Conference on Robotics and Automation, pp. 2619--2624, May 2004. -
Machine Learning for Fast Quadrupedal Locomotion
Nate Kohl and Peter Stone.
Proceedings of the Nineteenth National Conference on Artificial Intelligence, pp. 611--616, July 2004. -
Autonomous Learning of Stable Quadruped Locomotion
Manish Saggar and Thomas D'Silva and Nate Kohl and Peter Stone.
In Gerhard Lakemeyer, Elizabeth Sklar, Domenico Sorenti, and Tomoichi Takahashi, editors, RoboCup-2006: Robot Soccer World Cup X, Lecture Notes in Artificial Intelligence, pp. 98-109, Springer Verlag.