1

I am importing below df3 dataframe in my excel file and want to grouby only Name and rest dublicate data should reflect as below .

Note (Each Month data will be added as per month wise. )

Df3 =pd.read_Excel('Data')
print (df3)      

Name    ID  Month   Shift

Jon     1   Feb     A
Jon     1   Jan     B
Jon     1   Mar     C
Mike    1   Jan     A
Mike    1   Jan     B
Jon     1   Feb     C
Jon     1   Jan     A

enter image description here

and i want to have output like as below in the same formate . Please help me on same as im stuck here .

enter image description here

Will be greatfull for help and support .

anky
  • 74,114
  • 11
  • 41
  • 70

3 Answers3

5

You can achieve it by

df=df.iloc[pd.to_datetime(df.Month,format='%b').argsort()]
df=pd.concat([pd.DataFrame({'Month':[x] }).append(y).fillna('').append(pd.DataFrame(dict.fromkeys(y.columns,['']))) for x , y in df.groupby('Name')]).drop('Name',1).iloc[:-1]

print(df)

 Month ID Shift
0   Jon         
1   Jan  1     B
6   Jan  1     A
0   Feb  1     A
5   Feb  1     C
2   Mar  1     C
0               
0  Mike         
3   Jan  1     A
4   Jan  1     B
Umar.H
  • 22,559
  • 7
  • 39
  • 74
BENY
  • 317,841
  • 20
  • 164
  • 234
  • @YOBNE_S: .. Could you please check as Month is not assending . it coming first Feb then Jan then Mar. If you will see my out put ..it like Jan, Jan , Feb and Mar ... for each name.. could u please help me on the same – Manoj Kumar Jun 27 '20 at 16:29
  • @YOBNE_S : Really thanks for this but when i am runing i have error at first step .am 1st im reading my file and creating df (as Dataframe-df=pd.read_excel(r'C:\Users\dell\Desktop\stats.xlsx',sheet_name='Sheet5')) .. then i am follwoing you coder...df.iloc[pd.to_datetime(df.Month,format='%b').argsort()].. it showing error as "AttributeError: 'DataFrame' object has no attribute 'Month'".....Could you please check .. – Manoj Kumar Jun 27 '20 at 17:19
  • @YOBNE_S Please help me – Manoj Kumar Jun 27 '20 at 18:37
  • @ManojKumar the solution to your question is here, Wen has provided a great answer for you. – Umar.H Jun 27 '20 at 18:49
  • @Datanovice thats not working at all .. im stcuk .. well thanks for degrade me :(.. Please read my commets as i asked to Yobne again . i dont know how he is getting the same . please help me as i am really stuck here . i really did alot calculation and now i have only need to present like this and honestly i have faild to do so so requesting . please help me .. – Manoj Kumar Jun 27 '20 at 18:49
  • @Datanovice ..i am importing from Excel beacuse this table in my excel file and im importing as pd.read_excel and then im running above codes and thats not working could u please check . – Manoj Kumar Jun 27 '20 at 19:08
  • this the error from Yobin o accept the future behavior, pass 'sort=False'. To retain the current behavior and silence the warning, pass 'sort=True'. – Manoj Kumar Jun 28 '20 at 11:22
  • @Datanovice hope you doing great... just cheked another thing .. have you seen below result .. still Jan is reflecting after Mar for Jon and also i tried to add the data for april for both person thats also showing sapertly :( .. pleut ase help me on this .. i voted that its working but needs to further investigate ... Pleae help me – Manoj Kumar Jun 28 '20 at 20:30
1

Heres another solution using a list comp and df.duplicated with .loc for assignment.

import numpy as np
df = pd.read_excel(file,sheet_name=yoursheet)

#order the months. 

df['Month'] = pd.Categorical(df['Month'],
               pd.to_datetime(df['Month'],format='%b').drop_duplicates().sort_values().dt.strftime('%b'))



df = df.sort_values(['Month']).reset_index(drop=True)

df1 = pd.concat([data.append(data.iloc[0]) for name,data in df.groupby('Name')])

df1.loc[df1.duplicated(keep='last'),1:] = ''

df1['Name'] = np.where(df1['Month'].ne(''),df1['Month'],df1['Name'])

final = df1.drop('Month',1)

   Name ID Shift
0   Jon         
3   Jan  1     A
4   Feb  1     A
5   Feb  1     C
6   Mar  1     C
0   Jan  1     B
1  Mike         
2   Jan  1     B
1   Jan  1     A
Umar.H
  • 22,559
  • 7
  • 39
  • 74
  • well when i tried doing this step------- df1['Name'] = np.where(df1['Month'].ne(''),df1['Month'],df1['Name']). i have error as NameError: name 'np' is not defined. however i imported Numpy too – Manoj Kumar Jun 27 '20 at 19:49
  • @ManojKumar see above. – Umar.H Jun 27 '20 at 19:54
  • Really thank you so much for helping me ... but when i am runing this code df['Month'] = pd.Categorical(df['Month'], pd.to_datetime(df['Month'],format='%b').drop_duplicates().sort_values().dt.strftime('%b')).... showing error ... :( .. the error says "ValueError: unconverted data remains: " – Manoj Kumar Jun 27 '20 at 19:59
  • well i am sorry that code is working as there was space in between so i removedt and thats working but now i have only error with this code "df1['Name'] = np.where(df1['Month'].ne(''),df1['Month'],df1['Name'])" the error says NameError: name 'np' is not defined... dont know what to do this .. – Manoj Kumar Jun 27 '20 at 20:06
  • 1
    @ManojKumar no problem don't forget to accept YoBen's answer – Umar.H Jun 27 '20 at 20:34
  • @Databovice.. well i added some data for Jon and Mike Apr and Feb then its not working as the way it was working yesterday . the one i added new data for Jon and Mike for Apr and Feb . shwoing sepreatly as group .. which is not corret . ..Could u pls help ... – Manoj Kumar Jun 28 '20 at 11:13
0

Code

#creating sample data as per requirement
import pandas as pd 
df = pd.DataFrame({'Name':['Jon','Jon','Jon','Mike','Mike','Jon','Jon'],'ID':[1,1,1,1,1,1,1], 'Month':['Feb','Jan','Mar','Jan','Jan','Feb','Jan'], 'Shift':['A','B','C','A','B','C','A']})
#display data
df

enter image description here

df['Month'] = pd.Categorical(df['Month'],categories=['Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec'],ordered=True)
df = df.sort_values(['Name','Month']).reset_index(drop=True)
#display final data
df

enter image description here

I hope this would be helpful... : )

Littin Rajan
  • 852
  • 1
  • 10
  • 21