This is what I received from spreadsheet:
import pandas as pd
import numpy as np
arrays = [['Phase 1','Phase 1','Phase 1','Phase 1','Phase 1','Phase 1','Phase 1','Phase 1','Phase 1'],
['Function A','Function A','Function A','Function A','Function A','Function A','Function A','Function A'],
['Achieved on','Achieved on','Achieved on','Achieved on','Achieved on','Achieved on','Planned for', 'Due Date'],
['Deliverable 1','Status','True?','Deliverable 2','Status.1','True?.1','NaN','NaN']]
tuples=list(zip(*arrays))
index= pd.MultiIndex.from_tuples(tuples, names=['first','second','third','Project'])
s1=pd.Series(['10/10/2020','Updated','Yes','11/10/2020','Pending','','',''],
index=index)
reset_df=s1.reset_index()
df=pd.DataFrame(reset_df,index=['Project A', 'Project B'], columns=index)
df2=pd.Series(['10/10/2020','Updated','Yes','11/10/2020','Pending','','',''],
index=index)
df3=pd.Series(['06/06/2021','Issued','','','','','',''],index=index)
df=df.append([df2,df3], ignore_index=True)
df=df.drop([0,1])
df
Resulting in:
first Phase 1
second Function A
third Achieved on Planned for Due Date
Project Deliverable 1 Status True? Deliverable 2 Status.1 True?.1 NaN NaN
2 10/10/2020 Updated Yes 11/10/2020 Pending
3 06/06/2021 Issued
I'm trying to get this output:
data = {'Project':['2','2','3','3'],
'Phase':['Phase 1','Phase 1','Phase 1','Phase 1'],
'Function':['Function A','Function A','Function A','Function A'],
'Requisite':['Deliverable 1','Deliverable 2','Deliverable 1','Deliverable 2'],
'Achieved on':['10/10/2020','11/10/2020','06/06/2021',''],
'Status':['Updated','Pending','Issued',''],
'True?':['Yes','','','']}
output=pd.DataFrame(data)
output
Output is:
Project Phase Function Requisite Achieved on Status True?
0 2 Phase 1 Function A Deliverable 1 10/10/2020 Updated Yes
1 2 Phase 1 Function A Deliverable 2 11/10/2020 Pending
2 3 Phase 1 Function A Deliverable 1 06/06/2021 Issued
3 3 Phase 1 Function A Deliverable 2
Many posts here have me help a lot, but it seems so complex these operations in pandas.
I tried:
- working only with multiindex column level 4 when I read this df from excel (to minimize complexity), but it wasn't helpful.
- from this point, I tried to Stack 2 columns ("Status" and "True?") while use Melt to unpivot columns which have "Deliverables". Of course, it didn't work properly.
thanks in advance for any contribution,