2

How can I sample groups after a groupby in pandas? Say I want to get the first half of groups after groupby.

In [194]: df = pd.DataFrame({'name':['john', 'george', 'john','andrew','Daniel','george','andrew','Daniel'], 'hits':[12,34,13,23,53,47,20,48]})
In [196]: grouped = df.groupby('name')

There are 'john', 'george', 'andrew', 'daniel' 4 groups in grouped and I'm interested in getting 2 groups out of the 4. It doesn't matter which 2 groups it returns.

Thank you very much.

Alex Riley
  • 169,130
  • 45
  • 262
  • 238
BlueFeet
  • 2,407
  • 4
  • 21
  • 24

2 Answers2

3

You can sample the names ahead of time and only group the chosen names:

selected_names = np.random.choice(df.name.unique(),2,replace = False)
grouped = df[df.name.isin(selected_names)].groupby('name')
cwharland
  • 6,275
  • 3
  • 22
  • 29
0

Thanks for the quick replies, ajcr and cwharland. I was probably not clear about what I want, but your suggestions are great. I did:

choices =np.random.choice(grouped.indices.keys(), 2, replace=False)
df[df['name'].isin(choices)]

and get the results I hoped:

Out[215]: 
   hits    name
0    12    john
2    13    john
3    23  andrew
6    20  andrew

Thank you both!

BlueFeet
  • 2,407
  • 4
  • 21
  • 24
  • 1
    there's no need for a `groupby` if that is your desired output. Simply do: `selected_names = np.random.choice(df.name.unique(),2,replace = False)` followed by `df[df.name.isin(selected_names)]` – cwharland Dec 05 '14 at 15:39