Input df:
ID Date Start End
ABC 2012 15 17
ABC 2013 11 20
XYZ 2012 30 50
XYZ 2014 50 100
Expected Output outputdf:
ID Date Variable Value
ABC 2012 Start 15
ABC 2012 End 17
ABC 2013 Start 11
ABC 2013 End 20
XYZ 2012 Start 30
XYZ 2012 End 50
XYZ 2014 Start 50
XYZ 2014 End 100
Code:
columns=['ID','Date','Variable','Value']
outputdf=pd.DataFrame(index=range(len(df)*2),columns=columns)
tempPos=0
for pos,id in enumerate(df['ID']):
outputdf.loc[[tempPos,tempPos+1],'ID']=df.loc[pos,'ID']
outputdf.loc[[tempPos,tempPos+1],'Date']=df.loc[pos,'Date']
outputdf.loc[tempPos,'Variable']="Start"
outputdf.loc[tempPos+1,'Variable']="End"
outputdf.loc[tempPos,'Value']=df.loc[pos,'Start']
outputdf.loc[tempPos+1,'Value']=df.loc[pos,'End']
tempPos=tempPos+2
I have more than 100k rows and it takes a very long time. How do I optimize this? Any help is appreciated. Thanks!