I just started a project in which I need to sample from a high-dimensional (right now ~10^3-dimensional, but should go to ~10^6) & highly non-gaussian posterior distribution via a Hamiltonian Monte Carlo (HMC) sampler. In my specific case the gradients of the potential can be computed analytically and the mass matrix can be crudely approximated. As far as I understand, most probabilistic programming packages (like stan, tf.probability, edward, pymcx, ...) use some sort of numerical/stochastic gradient computation/approximation which might be more time consuming, as well as less efficient for convergence.
Right now I have a very naive python implementation along the lines of this pseudocode which is way to slow for high dimensional computations. Do you know whether there exist any performance-boosting modification of this algorithm or if there already are good and fast samplers around that explicitly take those analytical results into account?