0

I am studying how to train a normalizing flow model from the below tutorial,

enter image description here

kelvin.aaa2
  • 135
  • 10

1 Answers1

1

Note that 1-sel.alpha is the derivative of the scaling operation, thus the Jacobian of this operation is a diagonal matrix with z.shape[1:] entries on the diagonal, thus the Jacobian determinant is simply the product of these diagonal entries which gives rise to

ldj += np.log(1-self.alpa) * np.prod(z.shape[1:])

the second line accounts for the log determinant of the sigmoid $s(z)$ function as $s'(z)=s(z)(1-s(z))$. So the two lines result from the application of the chain rule which turns into a sum when taking the logarithm.

Setting ldj = torch.zeros(1,) is just the initialization of this variable - its value will be only updated in the module. Not sure what the motivation is, but it could be that they want to apply the dequant_module for each individual sample in the batch.

Butters
  • 53
  • 5