how to crate the group by in pandas only in one level

Question

I am importing below df3 dataframe in my excel file and want to grouby only Name and rest dublicate data should reflect as below .

Note (Each Month data will be added as per month wise. )

Df3 =pd.read_Excel('Data')
print (df3)      

Name    ID  Month   Shift

Jon     1   Feb     A
Jon     1   Jan     B
Jon     1   Mar     C
Mike    1   Jan     A
Mike    1   Jan     B
Jon     1   Feb     C
Jon     1   Jan     A

and i want to have output like as below in the same formate . Please help me on same as im stuck here .

Will be greatfull for help and support .

`df.set_index(['Name','Month']).to_excel(path_to_excel_file)` is a better format IMO — anky, Jun 27 '20 at 16:21
Does this answer your question? [not able to groupby by one level in My dataframe by pandas](https://stackoverflow.com/questions/62613892/not-able-to-groupby-by-one-level-in-my-dataframe-by-pandas) — Littin Rajan, Jun 28 '20 at 13:30

score 5 · Accepted Answer · edited Jun 27 '20 at 18:46

5

You can achieve it by

df=df.iloc[pd.to_datetime(df.Month,format='%b').argsort()]
df=pd.concat([pd.DataFrame({'Month':[x] }).append(y).fillna('').append(pd.DataFrame(dict.fromkeys(y.columns,['']))) for x , y in df.groupby('Name')]).drop('Name',1).iloc[:-1]

print(df)

 Month ID Shift
0   Jon         
1   Jan  1     B
6   Jan  1     A
0   Feb  1     A
5   Feb  1     C
2   Mar  1     C
0               
0  Mike         
3   Jan  1     A
4   Jan  1     B

edited Jun 27 '20 at 18:46

Umar.H

22,559
7
39
74

answered Jun 27 '20 at 16:23

BENY

317,841
20
164
234

@YOBNE_S: .. Could you please check as Month is not assending . it coming first Feb then Jan then Mar. If you will see my out put ..it like Jan, Jan , Feb and Mar ... for each name.. could u please help me on the same – Manoj Kumar Jun 27 '20 at 16:29
@YOBNE_S : Really thanks for this but when i am runing i have error at first step .am 1st im reading my file and creating df (as Dataframe-df=pd.read_excel(r'C:\Users\dell\Desktop\stats.xlsx',sheet_name='Sheet5')) .. then i am follwoing you coder...df.iloc[pd.to_datetime(df.Month,format='%b').argsort()].. it showing error as "AttributeError: 'DataFrame' object has no attribute 'Month'".....Could you please check .. – Manoj Kumar Jun 27 '20 at 17:19
@YOBNE_S Please help me – Manoj Kumar Jun 27 '20 at 18:37
@ManojKumar the solution to your question is here, Wen has provided a great answer for you. – Umar.H Jun 27 '20 at 18:49
@Datanovice thats not working at all .. im stcuk .. well thanks for degrade me :(.. Please read my commets as i asked to Yobne again . i dont know how he is getting the same . please help me as i am really stuck here . i really did alot calculation and now i have only need to present like this and honestly i have faild to do so so requesting . please help me .. – Manoj Kumar Jun 27 '20 at 18:49
@Datanovice ..i am importing from Excel beacuse this table in my excel file and im importing as pd.read_excel and then im running above codes and thats not working could u please check . – Manoj Kumar Jun 27 '20 at 19:08
this the error from Yobin o accept the future behavior, pass 'sort=False'. To retain the current behavior and silence the warning, pass 'sort=True'. – Manoj Kumar Jun 28 '20 at 11:22
@Datanovice hope you doing great... just cheked another thing .. have you seen below result .. still Jan is reflecting after Mar for Jon and also i tried to add the data for april for both person thats also showing sapertly :( .. pleut ase help me on this .. i voted that its working but needs to further investigate ... Pleae help me – Manoj Kumar Jun 28 '20 at 20:30

Umar.H · Answer 2 · 2020-06-27T19:53:56.903

1

Heres another solution using a list comp and df.duplicated with .loc for assignment.

import numpy as np
df = pd.read_excel(file,sheet_name=yoursheet)

#order the months. 

df['Month'] = pd.Categorical(df['Month'],
               pd.to_datetime(df['Month'],format='%b').drop_duplicates().sort_values().dt.strftime('%b'))



df = df.sort_values(['Month']).reset_index(drop=True)

df1 = pd.concat([data.append(data.iloc[0]) for name,data in df.groupby('Name')])

df1.loc[df1.duplicated(keep='last'),1:] = ''

df1['Name'] = np.where(df1['Month'].ne(''),df1['Month'],df1['Name'])

final = df1.drop('Month',1)

   Name ID Shift
0   Jon         
3   Jan  1     A
4   Feb  1     A
5   Feb  1     C
6   Mar  1     C
0   Jan  1     B
1  Mike         
2   Jan  1     B
1   Jan  1     A

edited Jun 27 '20 at 19:53

answered Jun 27 '20 at 19:21

Umar.H

22,559
7
39
74

well when i tried doing this step------- df1['Name'] = np.where(df1['Month'].ne(''),df1['Month'],df1['Name']). i have error as NameError: name 'np' is not defined. however i imported Numpy too – Manoj Kumar Jun 27 '20 at 19:49
@ManojKumar see above. – Umar.H Jun 27 '20 at 19:54
Really thank you so much for helping me ... but when i am runing this code df['Month'] = pd.Categorical(df['Month'], pd.to_datetime(df['Month'],format='%b').drop_duplicates().sort_values().dt.strftime('%b')).... showing error ... :( .. the error says "ValueError: unconverted data remains: " – Manoj Kumar Jun 27 '20 at 19:59
well i am sorry that code is working as there was space in between so i removedt and thats working but now i have only error with this code "df1['Name'] = np.where(df1['Month'].ne(''),df1['Month'],df1['Name'])" the error says NameError: name 'np' is not defined... dont know what to do this .. – Manoj Kumar Jun 27 '20 at 20:06
1

@ManojKumar no problem don't forget to accept YoBen's answer – Umar.H Jun 27 '20 at 20:34
@Databovice.. well i added some data for Jon and Mike Apr and Feb then its not working as the way it was working yesterday . the one i added new data for Jon and Mike for Apr and Feb . shwoing sepreatly as group .. which is not corret . ..Could u pls help ... – Manoj Kumar Jun 28 '20 at 11:13

score 0 · Answer 3 · answered Jun 27 '20 at 19:03

Code

#creating sample data as per requirement
import pandas as pd 
df = pd.DataFrame({'Name':['Jon','Jon','Jon','Mike','Mike','Jon','Jon'],'ID':[1,1,1,1,1,1,1], 'Month':['Feb','Jan','Mar','Jan','Jan','Feb','Jan'], 'Shift':['A','B','C','A','B','C','A']})
#display data
df

df['Month'] = pd.Categorical(df['Month'],categories=['Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec'],ordered=True)
df = df.sort_values(['Name','Month']).reset_index(drop=True)
#display final data
df

I hope this would be helpful... : )

how to crate the group by in pandas only in one level

3 Answers3

Linked