7

I have a list that looks like this:

var1     var2    count
 A        abc      4
 A        abc      3
 A        abc      2
 A        abc      1
 A        abc      1
 B        abc      7
 B        abc      5
 B        abc      2
 B        abc      1
 B        abc      1
 C        abc      4
 C        abc      3
 C        abc      2
 C        abc      1
 C        abc      1

 ....

I want to create a new dataframe with top 3 'count' results from each group. It should look like this:

     var1     var2    count
      A        abc      4
      A        abc      3
      A        abc      2
      B        abc      7
      B        abc      5
      B        abc      2
      C        abc      4
      C        abc      3
      C        abc      2
      ....

Is there a convenient way to do this in Python using head()?

Feyzi Bagirov
  • 1,292
  • 4
  • 28
  • 46

2 Answers2

14

Solution with set_index, groupby and SeriesGroupBy.nlargest:

df = df.set_index('var2').groupby("var1")['count'].nlargest(3).reset_index()
print (df)
  var1 var2  count
0    A  abc      4
1    A  abc      3
2    A  abc      2
3    B  abc      7
4    B  abc      5
5    B  abc      2
6    C  abc      4
7    C  abc      3
8    C  abc      2
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
8

If the count column has been sorted in descending order, then you can just use groupby.head to take the first three rows from each group:

df.groupby("var1").head(3)

enter image description here

Otherwise, you can group data frame by var1 and use nlargest to retrieve the three rows with top 3 counts:

df.groupby("var1", group_keys=False).apply(lambda g: g.nlargest(3, "count"))

enter image description here

Psidom
  • 209,562
  • 33
  • 339
  • 356