pandas - how to extract top three rows from the dataframe provided

Question

My pandas Data frame df could produce result as below:

grouped = df[(df['X'] == 'venture') & (df['company_code'].isin(['TDS','XYZ','UVW']))].groupby(['company_code','sector'])['X_sector'].count()

The output of this is as follows:

company_code  sector                            
TDS           Meta                                 404
              Electrical                           333
              Mechanical                           533
              Agri                                 453
XYZ           Sports                               331
              Electrical                           354
              Movies                               375
              Manufacturing                        355            
UVW           Sports                               505
              Robotics                             345
              Movies                               56
              Health                               3263
              Manufacturing                        456
              Others                               524
Name: X_sector, dtype: int64

What I want to get is the top three sectors within the company codes. What is the way to do it?

@RahulAgarwal - shared what was used to obtain the output. Given it is a series, I tried: grouped.sort_values(axis=0, ascending=False) This alters the output of group — The Roy, Oct 07 '18 at 13:46
See this: https://stackoverflow.com/questions/35364601/group-by-and-find-top-n-value-counts-pandas — Pygirl, Oct 07 '18 at 13:48
Try this: grouped = df[(df['X'] == 'venture') & (df['company_code'].isin(['USA','IND','GBR']))].groupby(['company_code','main_sector'])['X_sector'].count().reset_index(name='count').sort_values(['count'],ascending=False) — Pygirl, Oct 07 '18 at 14:02
This works. But it gives sorted order for count. This is one part. The second part is to select top two from each of these group arrangements. — The Roy, Oct 07 '18 at 14:08

score 4 · Accepted Answer · answered Oct 07 '18 at 14:09

You will have to chain a groupby here. Consider this example:

import pandas as pd
import numpy as np

np.random.seed(111)

names = [
    'Robert Baratheon',
    'Jon Snow',
    'Daenerys Targaryen',
    'Theon Greyjoy',
    'Tyrion Lannister'
]

df = pd.DataFrame({
    'season': np.random.randint(1, 7, size=100),
    'actor': np.random.choice(names, size=100),
    'appearance': 1
})

s = df.groupby(['season','actor'])['appearance'].count()
print(s.sort_values(ascending=False).groupby('season').head(1)) # <-- head(3) for 3 values

Returns:

season  actor             
4       Daenerys Targaryen    7
6       Robert Baratheon      6
3       Robert Baratheon      6
5       Jon Snow              5
2       Theon Greyjoy         5
1       Jon Snow              4

Where s is (clipped at 4)

season  actor             
1       Daenerys Targaryen    2
        Jon Snow              4
        Robert Baratheon      2
        Theon Greyjoy         3
        Tyrion Lannister      4
2       Daenerys Targaryen    4
        Jon Snow              3
        Robert Baratheon      1
        Theon Greyjoy         5
        Tyrion Lannister      3
3       Daenerys Targaryen    2
        Jon Snow              1
        Robert Baratheon      6
        Theon Greyjoy         3
        Tyrion Lannister      3
4 ...

@Learner Glad I could help. Happy coding! – Anton vBR Oct 07 '18 at 14:13 — Anton vBR, Oct 07 '18 at 14:13
@AntonvBR, nice example! +1 – Karn Kumar Oct 07 '18 at 16:58 — Karn Kumar, Oct 07 '18 at 16:58

score 0 · Answer 2 · answered Mar 29 '21 at 20:13

0

Why would you want things to be complicated, when there are simple codes possible:

Z = df.groupby('country_code')['sector'].value_counts().groupby(level=0).head(3).sort_values(ascending=False).to_frame('counts').reset_index()

Z

answered Mar 29 '21 at 20:13

Manoj Kumar

5,273
1
26
33

pandas - how to extract top three rows from the dataframe provided

2 Answers2