I have fit a series of SciPy continuous distributions for a Monte-Carlo simulation and am looking to take a large number of samples from these distributions. However, I would like to be able to take correlated samples, such that the i
th sample takes the e.g., 90th percentile from each of the distributions.
In doing this, I've found a quirk in SciPy performance:
# very fast way to many uncorrelated samples of length n
for shape, loc, scale, in distro_props:
sp.stats.norm.rvs(*shape, loc=loc, scale=scale, size=n)
# verrrrryyyyy slow way to take correlated samples of length n
correlate = np.random.uniform(size=n)
for shape, loc, scale, in distro_props:
sp.stats.norm.ppf(correlate, *shape, loc=loc, scale=scale)
Most of the results about this claim that the slowness on these SciPy distros if from the type-checking etc. wrappers. However when I profiled the code, the vast bulk of the time is spent in the underlying math function [_continuous_distns.py:179(_norm_pdf)]
1. Furthermore, it scales with n
, implying that it's looping through every elemnt internally.
The SciPy docs on rv_continuous almost seem to suggest that the subclass should override this for performance, but it seems bizarre that I would monkeypatch into SciPy to speed up their ppf. I would just compute this for the normal from the ppf formula, but I also use lognormal and skewed normal, which are more of a pain to implement.
So, what is the best way in Python to compute a fast ppf for normal, lognormal, and skewed normal distributions? Or more broadly, to take correlated samples from several such distributions?