Using an A2C agent from this article, how to get numerical values of value_loss
, policy_loss
and entropy_loss
when weights are being updated?
The model I'm using is double-headed, both heads share the same trunk. The policy head output shape is [number of actions, batch size]
and value head has a shape of [1, batch_size]
. Compiling this model returns a size incompatibility error, when these loss functions are given as metrics:
self.model.compile(optimizer=self.optimizer,
metrics=[self._logits_loss, self._value_loss],
loss=[self._logits_loss, self._value_loss])
Both self._value_loss
and self._policy_loss
are executed as graphs, meaning that all variables inside them are only pointers to graph nodes. I found some examples where Tensor objects are evaluated (with eval()) to get the value out of nodes. I don't understand them because in order to eval() a Tensor object you need to give it a Session but in TensorFlow 2.x Sessions are deprecated.
Another lead, when calling train_on_batch()
from Model API in Keras to train the model, the method returns losses. I don't understand why, but the only losses it returns are from the policy head. Losses from that head are calculated as policy_loss - entropy_loss
but my goal is to get all three losses separately to visualize them in a graph.
Any help is welcome, I'm stuck.