The agility of animals, particularly in complex activities such as running, turning, jumping, and backflipping, stands as an exemplar for robotic system design. Transferring this suite of behaviors to legged robotic systems introduces essential inquiries: How can a robot be trained to learn multiple locomotion behaviors simultaneously? How can the robot execute these tasks with a smooth transition? How to integrate these skills for wide-range applications? This paper introduces the Versatile Instructable Motion prior (VIM) - a Reinforcement Learning framework designed to incorporate a range of agile locomotion tasks suitable for advanced robotic applications. Our framework enables legged robots to learn diverse agile low-level skills by imitating animal motions and manually designed motions. Our Functionality reward guides the robot's ability to adopt varied skills, and our Stylization reward ensures that robot motions align with reference motions. Our evaluations of the VIM framework span both simulation environments and real-world deployment. To the best of our knowledge, this is the first work that allows a robot to concurrently learn diverse agile locomotion skills using a single learning-based controller in the real world.
We present the Versatile Instructable Motion prior (VIM), designed to acquire a wide range of agile locomotion skills concurrently from multiple reference motions. The development of our motion prior involves three stages: assembling a comprehensive dataset of reference motions sourced from diverse origins, crafting a motion prior that processes varying reference motions and the robot's proprioceptive feedback to generate motor commands, and finally, utilizing an imitation-based reward mechanism to effectively train this motion prior. Given the formulation of our motion prior, the robot learns diverse agile locomotion skills with our imitation reward and reward scheduling mechanics. Our reward offers consistent guidance, ensuring the robot captures both the functionality and style inherent to the reference motion.
Our motion prior consists of a reference motion encoder, and a low-level policy. Reference motion encoder maps varying reference motions into a condensed latent skill space, and low-level policy utilizes our imitation reward, reproduces the robot motion given a latent command.
Given the formulation of our motion prior, the robot learns diverse agile locomotion skills with our imitation reward and reward scheduling mechanics. Our reward offers consistent guidance, ensuring the robot captures both the functionality and style inherent to the reference motion.
Reference motions are rendered step by step with render without physical simulation. Robot might not be able to fully imitate the reference motion due to capacity of the robot.