Sample each group after pandas groupby

Question

I know this must have been answered some where but I just could not find it.

Problem: Sample each group after groupby operation.

import pandas as pd

df = pd.DataFrame({'a': [1,2,3,4,5,6,7],
                   'b': [1,1,1,0,0,0,0]})

grouped = df.groupby('b')

# now sample from each group, e.g., I want 30% of each group

from pandas 1.1, you can just do `df.groupby('b').sample()`. [Relevant docs](https://pandas.pydata.org/docs/dev/reference/api/pandas.core.groupby.DataFrameGroupBy.sample.html) — cs95, Jul 29 '20 at 10:27

EdChum · Accepted Answer · 2016-04-03T20:15:21.680

81

Apply a lambda and call sample with param frac:

In [2]:
df = pd.DataFrame({'a': [1,2,3,4,5,6,7],
                   'b': [1,1,1,0,0,0,0]})

grouped = df.groupby('b')
grouped.apply(lambda x: x.sample(frac=0.3))

Out[2]:
     a  b
b        
0 6  7  0
1 2  3  1

edited Apr 03 '16 at 20:15

answered Apr 03 '16 at 20:10

EdChum

376,765
198
813
562

cs95 · Answer 2 · 2020-07-29T10:26:21.567

50

pandas >= 1.1: `GroupBy.sample`

This works like magic:

# np.random.seed(0)
df.groupby('b').sample(frac=.3) 

   a  b
5  6  0
0  1  1

pandas <= 1.0.X

You can use GroupBy.apply with sample. You do not need to use a lambda; apply accepts keyword arguments:

df.groupby('b', group_keys=False).apply(pd.DataFrame.sample, frac=.3)

   a  b
5  6  0
0  1  1

edited Jul 29 '20 at 10:26

answered Jul 01 '19 at 19:49

cs95

379,657
97
704
746

```df.sample(frac=1).groupby('b').head(2)``` This one is not the same. Sample take samples uniformly, this one first first one. The usage of them depend on task, but the head one depend on sorting order, when sample does not. – melgor89 Jul 17 '20 at 06:47

Sample each group after pandas groupby

2 Answers2

pandas >= 1.1: `GroupBy.sample`

pandas <= 1.0.X

Linked

Related

Sample each group after pandas groupby

2 Answers2

pandas >= 1.1: GroupBy.sample

pandas <= 1.0.X

Linked

Related

pandas >= 1.1: `GroupBy.sample`