2

I've used Ray RLlib's DQN to train in my custom simulator. It usually produced good results after 15 million steps.

After playing around with DQN for a while, I'm now trying to train A2C in the simulator. However, it's not even close to converging as you can see in the graph below. Usually, -50 is considered maximum in my simulator, which is mostly reached by 15 million steps with DQN.

enter image description here

The simulator is exactly the same for both DQN and A2C:

  • 71 discrete observations
  • 3 discrete actions.

I reckoned the environment doesn't have to change at all for either algorithm. Maybe I'm wrong there...

Can someone think of a reason why A2C isn't learning in my simulator?

Parameters for A2C:

(Same as the default configuration on Ray RLlib)

    # Should use a critic as a baseline (otherwise don't use value baseline;
    # required for using GAE).
    "use_critic": True,
    # If true, use the Generalized Advantage Estimator (GAE)
    # with a value function, see https://arxiv.org/pdf/1506.02438.pdf.
    "use_gae": True,
    # Size of rollout batch
    "rollout_fragment_length": 20,
    # GAE(gamma) parameter
    "lambda": 1.0,
    # Max global norm for each gradient calculated by worker
    "grad_clip": 40.0,
    # Learning rate
    "lr": 0.0001,
    # Learning rate schedule
    "lr_schedule": None,
    # Value Function Loss coefficient
    "vf_loss_coeff": 0.5,
    # Entropy coefficient
    "entropy_coeff": 0.01,
    # Min time per iteration
    "min_iter_time_s": 10,
    # Workers sample async. Note that this increases the effective
    # rollout_fragment_length by up to 5x due to async buffering of batches.
    "sample_async": False,
    # Switch on Trajectory View API for A2/3C by default.
    # NOTE: Only supported for PyTorch so far.
    "_use_trajectory_view_api": True,
    # A2C supports microbatching, in which we accumulate gradients over
    # batch of this size until the train batch size is reached. This allows
    # training with batch sizes much larger than can fit in GPU memory.
    # To enable, set this to a value less than the train batch size.
    "microbatch_size": None
Kai Yun
  • 97
  • 8

0 Answers0