I have a dataframe that contains these events:
ID m1 m2 m3 m4
1 xxxx/xxxxx.0183683234 2019-10-28 2019-11-28 2019-11-30 NaT
2 xxxx/xxxxx.0183679721 2019-11-28 2019-11-28 NaT NaT
4 xxxx/xxxxx.0183888975 2019-11-20 2019-12-10 NaT NaT
This events are occuring in a temporal sequence. This means that :
m1< m2< m3< m4 <...< mn
The goal is to estimate m3 and m4 before it actually happens.
To do so, I use a masterdata that gives me the duration between m2 and m3 and m3 and m4.
The expected output is:
xxxxxxxxxxID m1 m2 m3 m4 M2_M3 M3_M4 m3_estimated m4_estimated
1 xxxx/xxxxx.0183683234 2019-10-28 2019-11-28 2019-11-30 NaT 2 days 9 days 2019-11-30 2019-12-09
2 xxxx/xxxxx.0183679721 2019-11-28 2019-11-28 NaT NaT 2 days 6 days 2019-11-30 NaT
4 xxxx/xxxxx.0183888975 2019-11-20 2019-12-10 NaT NaT 6 days 1 days 2019-12-16 NaT
I want to recalculate everytime m3 and m4 is not null anymore.
Here are the functions I tried, but they are not really working:
def m3_estimated(df):
if df['m2']!= None:
return pd.to_datetime(df['m2']) + df['M2_M3']
else:
None
def m4_estimated(df):
if df['m3'] != None:
return pd.to_datetime(df['m3']) + df['M3_M4']
else:
None