This question is a follow-up from this other question.
I've got this dataframe (slightly different from the one in the other question - I added a "state" column now):
person_code #CNAE growth size state
0 231 32 0.54 32 FR
1 233 43 0.12 333 LK
2 432 32 0.44 21 FR
3 431 56 0.32 23 KS
4 654 89 0.12 89 FR
5 764 32 0.20 211 TI
6 434 32 0.82 90 TI
Each row is a different person. I wanted to create a df that looks like this:
person_code #CNAE growth size state top3growth ...
0 . 231 32 0.54 32 FR [dict_top3_type_32]
1 . 233 43 0.12 333 LK [dict_top3_type_43]
2 . 432 32 0.44 21 FR [dict_top3_type_32]
3 . 431 56 0.32 23 KS [dict_top3_type_56]
4 . 654 89 0.12 89 FR [dict_top3_type_89]
5 . 764 32 0.20 211 TI [dict_top3_type_32]
6 . 434 32 0.82 90 TI [dict_top3_type_32]
For different reasons we've agreed that the best way was to create different dictionaries. Each dict would comprise the 3 persons with the same "#CNAE" that had the largest "growth" rate. They looked like this:
top3growth
0 {'person_code': {0: 231, 2: 432, 6: 434}, 'gro...
1 {'person_code': {1: 233}, 'growth': {1: 0.12},...
2 {'person_code': {0: 231, 2: 432, 6: 434}, 'gro...
3 {'person_code': {3: 431}, 'growth': {3: 0.32},...
4 {'person_code': {4: 654}, 'growth': {4: 0.12},...
5 {'person_code': {0: 231, 2: 432, 6: 434}, 'gro...
6 {'person_code': {0: 231, 2: 432, 6: 434}, 'gro...
I was able to achieve that thanks to the excellent solution suggested by @Wen:
a=df.groupby('#CNAE',group_keys=False).apply(pd.DataFrame.nlargest,n=3,columns='growth')
df['top3growth']=df['#CNAE'].map(a.groupby('#CNAE').apply(lambda x : x.to_dict()))
My problem now is that I need to create a new column of dictionaries. But now I need to find the 3 largest growth persons that have the same "#CNAE" and "state" (not only the same #CNAE).
I've made a few attempts trying to groupby '#CNAE' and 'state', like this one below, but it didn't work out:
a=df.groupby(['#CNAE','state'],group_keys=False).apply(pd.DataFrame.nlargest,n=3,columns='growth')
df['top3growthCNAEstate']=df[['#CNAE','state']].map(a.groupby(['#CNAE','state']).apply(lambda x : x.to_dict()))
I understand the solution would look much like this but I can't get it to work out. Would anyone have a suggestion?