I am trying to implement TD-Gammon, as described in this paper, which uses the TD-Lambda learning algorithm . This has been done already here, but it is 4 years old and doesn't use Tensorflow 2. I am trying to do this in Tensorflow 2 and think I need to create a custom optimizer to perform the weight change as described in the paper linked above.
I know that to create a custom optimizer, you need to subclass the Optimizer class and implement the create_slots
, resource_apply_dense
, resource_apply_sparse
, and get_config
methods. However, the weight change algorithm for TD-Lambda requires the neural network outputs (Y_t-1
and Y_t
in the paper) and the resource_apply_dense
method doesn't seem to have access to that.
How do I access the neural network outputs? Or am I just going about this the wrong way?