Time consumption of SciPy's bootstrap as a function of the number of resamples

Question

I have a large dataset, with on the order of 2^15 entries, and I calculate the confidence interval of the mean of the entries with scipy.stats.bootstrap. For a dataset this size, this costs about 6 seconds on my laptop. I have a lot of datasets, so I find this takes too long (especially if I just want to do a test run to debug the plotting etc.). By default, Scipy's bootstrapping function resamples the data n_resamples=9999 times. As I understand it, the resampling and computing the average of the resampled data should be the most time-consuming part of the process. However, when I reduce the number of resamples by roughly three orders of magnitude (n_resamples=10), the computational time of the bootstrapping method does not even half.

How can I do faster bootstrapping?

I'm using python3 and SciPy 1.9.3.

import numpy as np
from scipy import stats
from time import time

data=np.random.rand(2**15)
data=np.array([data])

start=time()
bs=stats.bootstrap(data,np.mean,batch=1,n_resamples=9999)
end=time()
print(end-start)

start=time()
bs=stats.bootstrap(data,np.mean,batch=1,n_resamples=10)
end=time()
print(end-start)

start=time()
bs=stats.bootstrap(data,np.mean,n_resamples=10)
end=time()
print(end-start)

gives

6.021066904067993
3.9989020824432373
30.46708607673645

To speed up bootstrapping, I have set batch=1. As I understand it, this is more memory efficient, and prevents swapping the data. Setting a higher batch number increases the time consumption, as you can see above.

Time consumption of SciPy's bootstrap as a function of the number of resamples

0 Answers0