Approximator of Log likelihood of tanh(mean + std*z)

Asked Mar 05 '19 at 17:15

Active Mar 05 '19 at 17:15

Viewed 209 times

I have been trying to understand a blog on soft actor critic where we have a neural network representing a policy that outputs mean and std of gaussian distribution of action for a given state. Since direct back-propagation through stochastic node is not possible , reparamterization trick is applied as follows:

    `normal = Normal(0, 1)
    z      = normal.sample()
    action = torch.tanh(mean+ std*z.to(device))
    log_prob = Normal(mean, std).log_prob(mean+ std*z.to(device)) - torch.log(1 - action.pow(2) + epsilon)
    return action, log_prob, z, mean, log_std`

I want to know how the log_prob term was derived. Any help would be highly appreciated.

asked Mar 05 '19 at 17:15

user3105965

I think I might be too late, but you can read Appendix C of the original [Soft Actor-Critic paper](https://arxiv.org/abs/1801.01290). Basically, this `torch.log(1 - action.pow(2) + epsilon)` is from squashing the `action` between -1 and 1 using the `tanh` function. – esh3390 Jun 04 '23 at 10:21

Approximator of Log likelihood of tanh(mean + std*z)

0 Answers0