1

After some data preprocessing, I am stuck with merging the datasets. What I would like to do is to merge all dates under the same column as following in a row order according to same batch number. Normally it was stacked data, then I have unstacked according to process steps (40,42,50). Then my intention is to take the time difference of two columns as minutes.

DATASET:

         process     40-------------------42-----------------------50

index   batch   

64177   699042  NaT------------------2019-01-10 18:28:05-------NaT

171272  699042  NaT------------------NaT-------------------- 2019-01-10 18:28:20

120655  699042  2019-01-10 17:40:09--NaT----------------------NaT

120656  699043  2019-01-10 17:40:09--NaT----------------------NaT

67362   699043  NaT------------------2019-01-10 20:43:25-------NaT

168373  699043  NaT------------------NaT-----------------------2019-01-10 20:43:33

WHAT I WANT IS:

         process     40-------------------42-----------------------50
batch   

699042  2019-01-10 17:40:09-----2019-01-10 18:28:05-------2019-01-10 18:28:20

699043  2019-01-10 17:40:09----2019-01-10 20:43:25-------2019-01-10 20:43:33
BENY
  • 317,841
  • 20
  • 164
  • 234

1 Answers1

2

You can try groupby with first

urdf=df.groupby(level=1).first()
BENY
  • 317,841
  • 20
  • 164
  • 234