-1

My pandas Data frame df could produce result as below:

grouped = df[(df['X'] == 'venture') & (df['company_code'].isin(['TDS','XYZ','UVW']))].groupby(['company_code','sector'])['X_sector'].count()

The output of this is as follows:

company_code  sector                            
TDS           Meta                                 404
              Electrical                           333
              Mechanical                           533
              Agri                                 453
XYZ           Sports                               331
              Electrical                           354
              Movies                               375
              Manufacturing                        355            
UVW           Sports                               505
              Robotics                             345
              Movies                               56
              Health                               3263
              Manufacturing                        456
              Others                               524
Name: X_sector, dtype: int64

What I want to get is the top three sectors within the company codes. What is the way to do it?

The Roy
  • 2,178
  • 1
  • 17
  • 33
  • @RahulAgarwal - shared what was used to obtain the output. Given it is a series, I tried: grouped.sort_values(axis=0, ascending=False) This alters the output of group – The Roy Oct 07 '18 at 13:46
  • See this: https://stackoverflow.com/questions/35364601/group-by-and-find-top-n-value-counts-pandas – Pygirl Oct 07 '18 at 13:48
  • Try this: grouped = df[(df['X'] == 'venture') & (df['company_code'].isin(['USA','IND','GBR']))].groupby(['company_code','main_sector'])['X_sector'].count().reset_index(name='count').sort_values(['count'],ascending=False) – Pygirl Oct 07 '18 at 14:02
  • This works. But it gives sorted order for count. This is one part. The second part is to select top two from each of these group arrangements. – The Roy Oct 07 '18 at 14:08

2 Answers2

4

You will have to chain a groupby here. Consider this example:

import pandas as pd
import numpy as np

np.random.seed(111)

names = [
    'Robert Baratheon',
    'Jon Snow',
    'Daenerys Targaryen',
    'Theon Greyjoy',
    'Tyrion Lannister'
]

df = pd.DataFrame({
    'season': np.random.randint(1, 7, size=100),
    'actor': np.random.choice(names, size=100),
    'appearance': 1
})

s = df.groupby(['season','actor'])['appearance'].count()
print(s.sort_values(ascending=False).groupby('season').head(1)) # <-- head(3) for 3 values

Returns:

season  actor             
4       Daenerys Targaryen    7
6       Robert Baratheon      6
3       Robert Baratheon      6
5       Jon Snow              5
2       Theon Greyjoy         5
1       Jon Snow              4

Where s is (clipped at 4)

season  actor             
1       Daenerys Targaryen    2
        Jon Snow              4
        Robert Baratheon      2
        Theon Greyjoy         3
        Tyrion Lannister      4
2       Daenerys Targaryen    4
        Jon Snow              3
        Robert Baratheon      1
        Theon Greyjoy         5
        Tyrion Lannister      3
3       Daenerys Targaryen    2
        Jon Snow              1
        Robert Baratheon      6
        Theon Greyjoy         3
        Tyrion Lannister      3
4 ...
Anton vBR
  • 18,287
  • 5
  • 40
  • 46
0

Why would you want things to be complicated, when there are simple codes possible:

Z = df.groupby('country_code')['sector'].value_counts().groupby(level=0).head(3).sort_values(ascending=False).to_frame('counts').reset_index()

Z
Manoj Kumar
  • 5,273
  • 1
  • 26
  • 33