I've got this dataframe:
person_code #CNAE growth size
0 231 32 0.54 32
1 233 43 0.12 333
2 432 32 0.44 21
3 431 56 0.32 23
4 654 89 0.12 89
5 764 32 0.20 211
6 434 32 0.82 90
I need to create a new column called "top3growth". For that I will need to check df's #CNAE for each row and add an extra column pointing out which are the 3 persons with highest growth for that CNAE (it will add a dataframe inside the df dataframe). To create the "top3dfs" I'm using this groupby:
a=sql2.groupby('#CNAE',group_keys=False).apply(pd.DataFrame.nlargest,n=3,columns='growth')
(This solution came out of this question.)
It should look like this:
person_code #CNAE growth size top3growth ...
0 . 231 32 0.54 32 [df_top3_type_32]
1 . 233 43 0.12 333 [df_top3_type_43]
2 . 432 32 0.44 21 [df_top3_type_32]
3 . 431 56 0.32 23 [df_top3_type_56]
4 . 654 89 0.12 89 [df_top3_type_89]
5 . 764 32 0.20 211 [df_top3_type_32]
6 . 434 32 0.82 90 [df_top3_type_32]
...
df_top3_type_32 should look like this (for example):
person_code #CNAE growth size
6 . 434 32 0.82 90
0 . 231 32 0.54 32
2 . 432 32 0.44 21
I'm trying to solve my problem by using:
df['top3growth']=np.nan
for i in df.index:
df['top3growth'].loc[i]=a[a['#CNAE'] == df['#CNAE'].loc[i]]
But I'm getting:
ValueError: Incompatible indexer with DataFrame
Does anyone know what's going on? Is there a more efficient way of doing this (not using a for loop)?