As far as the SGPMC paper[1] goes, the pretraining should be pretty much identical to SVGP.
However, the implementations (current dev version) differ a bit, and I'm having some problems understanding everything (especially what happens with the conditionals with q_sqrt=None
) due to the dispatch programming style.
Do I see it correctly, that the difference is that q_mu
/q_var
are now represented by that self.V
normal distribution? And the only other change would be that whitening is on per default because it's required for the sampling?
The odd thing is that stochastic optimization (without any sampling yet) of SPGMC seems to work quite a bit better on my specific data than with the SVGP class, which got me a bit confused, since it should basically be the same.
[1]Hensman, James, et al. "MCMC for variationally sparse Gaussian processes." Advances in Neural Information Processing Systems. 2015.
Edit2:
In the current dev branch I see that the (negative) training_objective consists basically of:
VariationalExp + self.log_prior_density()
,
whereas the SVGP ELBO would be VariationalExp - KL(q(u)|p(u))
.
self.log_prior_density()
apparently adds all the prior densities.
So the training objective looks like equation (7) of the SGPMC paper (the whitened optimal variational distribution).
So by optimizing the optimal variational approximation to the posterior p(f*,f, u, θ | y)
, we would be getting the MAP estimation of inducing points?