Just want to add another approach that uses numpy.random.Generator.choice. This approach will work whether your data is a numpy array or pandas dataframe.
Using the sample of data your provided
df = pd.DataFrame({'index': [0, 1, 2, 3],
'A': [1, 1, 1, 1],
'B': [2, 2, 2, 2]})
df
Here is how I would do it with using the numpy approach
rng = np.random.default_rng()
def simple_bootstrap(data, replace=True, replicates=5, random_state=None, shuffle=True):
def simple_resample(data, size=len(data), replace=replace, shuffle=shuffle, axis=0):
return rng.choice(a=data, size=size, axis=axis)
return [simple_resample(data) for _ in range(replicates)]
When I call the function on my df
like below, it gives me 4 random selections from my data
simple_bootstrap(df)
[array([[1, 1, 2],
[2, 1, 2],
[0, 1, 2],
[3, 1, 2]], dtype=int64),
array([[0, 1, 2],
[1, 1, 2],
[1, 1, 2],
[3, 1, 2]], dtype=int64),
array([[3, 1, 2],
[1, 1, 2],
[1, 1, 2],
[2, 1, 2]], dtype=int64),
array([[3, 1, 2],
[1, 1, 2],
[3, 1, 2],
[3, 1, 2]], dtype=int64),
array([[0, 1, 2],
[3, 1, 2],
[3, 1, 2],
[3, 1, 2]], dtype=int64)]
Remember, although I asked for replicates=5
, it got 4 random samples, because If a has more than one dimension, the size shape will be inserted into the axis dimension, so the output ndim will be a.ndim - 1 + len(size).
You could also extend your bootstrap function to include a statistical function that runs over each replication and saves it into a list, like the example below:
def simple_bootstrap(data, statfunction, replace=True, replicates=5, random_state=None, shuffle=True):
def simple_resample(data, size=len(data), replace=replace, shuffle=shuffle, axis=0):
return rng.choice(a=data, size=size, axis=axis)
resample_estimates = [statfunction(simple_resample(data)) for _ in range(replicates)]
return resample_estimates