Trained well with DQN, but not learning with A2C

Question

I've used Ray RLlib's DQN to train in my custom simulator. It usually produced good results after 15 million steps.

After playing around with DQN for a while, I'm now trying to train A2C in the simulator. However, it's not even close to converging as you can see in the graph below. Usually, -50 is considered maximum in my simulator, which is mostly reached by 15 million steps with DQN.

The simulator is exactly the same for both DQN and A2C:

71 discrete observations
3 discrete actions.

I reckoned the environment doesn't have to change at all for either algorithm. Maybe I'm wrong there...

Can someone think of a reason why A2C isn't learning in my simulator?

Parameters for A2C:

(Same as the default configuration on Ray RLlib)

    # Should use a critic as a baseline (otherwise don't use value baseline;
    # required for using GAE).
    "use_critic": True,
    # If true, use the Generalized Advantage Estimator (GAE)
    # with a value function, see https://arxiv.org/pdf/1506.02438.pdf.
    "use_gae": True,
    # Size of rollout batch
    "rollout_fragment_length": 20,
    # GAE(gamma) parameter
    "lambda": 1.0,
    # Max global norm for each gradient calculated by worker
    "grad_clip": 40.0,
    # Learning rate
    "lr": 0.0001,
    # Learning rate schedule
    "lr_schedule": None,
    # Value Function Loss coefficient
    "vf_loss_coeff": 0.5,
    # Entropy coefficient
    "entropy_coeff": 0.01,
    # Min time per iteration
    "min_iter_time_s": 10,
    # Workers sample async. Note that this increases the effective
    # rollout_fragment_length by up to 5x due to async buffering of batches.
    "sample_async": False,
    # Switch on Trajectory View API for A2/3C by default.
    # NOTE: Only supported for PyTorch so far.
    "_use_trajectory_view_api": True,
    # A2C supports microbatching, in which we accumulate gradients over
    # batch of this size until the train batch size is reached. This allows
    # training with batch sizes much larger than can fit in GPU memory.
    # To enable, set this to a value less than the train batch size.
    "microbatch_size": None

Trained well with DQN, but not learning with A2C

Parameters for A2C:

0 Answers0