In pandas (Python's closest analogue to R) there are the DataFrame.sample
and Series.sample
methods, which were both introduced in version 0.16.1.
For example:
>>> df = pd.DataFrame({'a': [1, 2, 3, 4, 5], 'b': [6, 7, 8, 9, 0]})
>>> df
a b
0 1 6
1 2 7
2 3 8
3 4 9
4 5 0
Sampling 3 rows without replacement:
>>> df.sample(3)
a b
4 5 0
1 2 7
3 4 9
Sample 4 rows from column 'a' with replacement, using column 'b' as the corresponding weights for the choices:
>>> df['a'].sample(4, replace=True, weights=df['b'])
3 4
0 1
0 1
2 3
These methods are almost identical to the R function, allowing you to sample a particular number of values - or fraction of values - from your DataFrame/Series, with or without replacement. Note that the prob
argument in R's sample()
corresponds to weights
in the pandas methods.