1

I have been chasing this problem of using RL to train a quadruped to walk. But have got NO noteworthy success. Following are the Details of the GYM ENV I am using.
Sim: pybullet

env.action_space = box(shape=(12,), upper = 1, lower = -1)

converting selected actions and multiplying them by max_actions specified for each joint. action space are the 3 motor positions(hip_joint_y, hip_joint_x, knee_joint) x 4 legs of the robot

env.observation_space = box(shape=(12,), upper = np.inf, lower = -np.inf)

observation_space include

  • roll, pitch of the body [r, p]
  • angular vel [x, y, z]
  • linear acc [x, y, z]
  • Binary contact forces for each leg [1 if in contact else 0]. [1, 1, 1, 1]
reward = (
             + distance_reward 
             - body_rotation_reward
             - energy_usage_reward 
             - body_drift_from x-axis reward
             - body_shake_reward)

I have tried the following approaches.

  • Using PPO from stable-baselines3 for 20 million timesteps [No Distinct improvement]

  • Using DDPG, TD3, SAC, A2C, and PPO with 5 million timesteps on each algo increasing policy network up to 4 layers of 1024 neurons each [1024, 1024, 1024, 1024] for qf and vf, or actor and critic.

  • Using the Discrete Delta concept to scale action limits so changing action_space from box to MultiDiscrete with each action limiting from 0 to 6. discrete_delta_vals = [-0.3, -0.1, -0.03, 0, 0.03, 0.1, 0.3]. Each joint value is decided from choosing one value from the discrete_delta_vals list and adding that value to the previous actions.

  • Keeping hip_joint_y of all legs as zeros and changing action space from box(shape=(12,)) to box(shape=(8,)). Trained this agent for another 6M timesteps, there seems to be a small improvement at first and then the eps_length and mean_reward settles and no significant improvements afterwards.

  • I have generated Half Ellipsoid Trajectories with IK and That works but that is explicitly Robotics Approach to solve this problem. I am currently looking into DeepMimic to use those trajectories to guide RL to build a stable walking gait. No Significant breakthrough.

Here is the Repo Link

Check the scripts folder and go through the start_training_v(x).py scripts. Thanks in Advance. If you feel like discussing the entire topic to sort this please drop your email in the comment and I'll reach out to you.

1 Answers1

1

Hi try using Nvidia IsaacGym. This uses pytorch end to endon GPU with PPO. I was able to train a custom urdf to walk in about 10 minutes of training