I have been chasing this problem of using RL to train a quadruped to walk. But have got NO noteworthy success. Following are the Details of the GYM ENV I am using.
Sim: pybullet
env.action_space = box(shape=(12,), upper = 1, lower = -1)
converting selected actions and multiplying them by max_actions specified for each joint. action space are the 3 motor positions(hip_joint_y, hip_joint_x, knee_joint) x 4 legs of the robot
env.observation_space = box(shape=(12,), upper = np.inf, lower = -np.inf)
observation_space include
- roll, pitch of the body [r, p]
- angular vel [x, y, z]
- linear acc [x, y, z]
- Binary contact forces for each leg [1 if in contact else 0]. [1, 1, 1, 1]
reward = (
+ distance_reward
- body_rotation_reward
- energy_usage_reward
- body_drift_from x-axis reward
- body_shake_reward)
I have tried the following approaches.
Using PPO from stable-baselines3 for 20 million timesteps [No Distinct improvement]
Using DDPG, TD3, SAC, A2C, and PPO with 5 million timesteps on each algo increasing policy network up to 4 layers of 1024 neurons each [1024, 1024, 1024, 1024] for qf and vf, or actor and critic.
Using the Discrete Delta concept to scale action limits so changing action_space from box to MultiDiscrete with each action limiting from 0 to 6. discrete_delta_vals = [-0.3, -0.1, -0.03, 0, 0.03, 0.1, 0.3]. Each joint value is decided from choosing one value from the discrete_delta_vals list and adding that value to the previous actions.
Keeping hip_joint_y of all legs as zeros and changing action space from box(shape=(12,)) to box(shape=(8,)). Trained this agent for another 6M timesteps, there seems to be a small improvement at first and then the eps_length and mean_reward settles and no significant improvements afterwards.
I have generated Half Ellipsoid Trajectories with IK and That works but that is explicitly Robotics Approach to solve this problem. I am currently looking into DeepMimic to use those trajectories to guide RL to build a stable walking gait. No Significant breakthrough.
Check the scripts folder and go through the start_training_v(x).py scripts. Thanks in Advance. If you feel like discussing the entire topic to sort this please drop your email in the comment and I'll reach out to you.