1

I have a data frame:

Id  modification_date   date1   date2   estimate_date
aaa 2022-07-19T13:27:01Z    2022-07-21T20:01:00Z    2022-07-21T22:59:00Z    
aaa 2022-07-20T13:49:21Z    2022-07-21T20:01:00Z    2022-07-21T22:59:00Z    2022-07-22T01:06:05Z
aaa 2022-07-21T14:24:20Z    2022-07-21T14:01:00Z    2022-07-21T22:59:00Z    2022-07-21T19:06:33Z
aaa 2022-07-21T18:52:23Z    2022-07-21T14:01:00Z    2022-07-21T22:59:00Z    
bbb 2022-07-18T21:04:13Z    2022-07-20T13:01:00Z    2022-07-21T00:59:00Z    
bbb 2022-07-19T20:30:20Z    2022-07-20T13:01:00Z    2022-07-21T00:59:00Z    
bbb 2022-07-20T17:37:58Z    2022-07-20T13:01:00Z    2022-07-21T00:59:00Z    
ddd 2022-07-11T14:40:09Z    2022-07-12T22:01:00Z    2022-07-13T00:59:00Z    2022-07-12T23:00:33Z
ddd 2022-07-11T17:29:11Z    2022-07-12T22:01:00Z    2022-07-13T00:59:00Z    2022-07-13T00:37:21Z
ddd 2022-07-13T20:36:11Z    2022-07-12T22:01:00Z    2022-07-13T00:59:00Z    
eee 2022-07-14T14:44:08Z    2022-07-18T17:01:00Z    2022-07-18T19:59:00Z    2022-07-18T21:36:38Z
eee 2022-07-18T19:45:42Z    2022-07-21T17:01:00Z    2022-07-21T19:59:00Z    2022-07-21T17:01:00Z
eee 2022-07-19T20:46:39Z    2022-07-21T17:01:00Z    2022-07-21T19:59:00Z    
eee 2022-07-21T13:35:07Z    2022-07-21T17:01:00Z    2022-07-21T19:59:00Z    2022-07-21T20:01:41Z
eee 2022-07-21T21:37:24Z    2022-07-21T17:01:00Z    2022-07-21T19:59:00Z    

I want to keep the latest modified record, but also, I dont want to miss on the least estimated date for that day's date1 and date2.

What I am doing is first sorting on the basis of modification date on group of id, and then again grouping date1 and date2 together where the last modified record's date1 and date2 group is same. and sorting on that group to get minimum estimation date.

What I am expecting is:

Id  modification_date   date1   date2   new_estimate_date
aaa 2022-07-21T18:52:23Z    2022-07-21T14:01:00Z    2022-07-21T22:59:00Z    2022-07-21T19:06:33Z
bbb 2022-07-20T17:37:58Z    2022-07-20T13:01:00Z    2022-07-21T00:59:00Z    
ddd 2022-07-13T20:36:11Z    2022-07-12T22:01:00Z    2022-07-13T00:59:00Z    2022-07-12T23:00:33Z
eee 2022-07-21T21:37:24Z    2022-07-21T17:01:00Z    2022-07-21T19:59:00Z    2022-07-21T17:01:00Z

I have tried:

def estimatedCT(df):           
        for i, j in df.sort_values('modification_date', ascending=False).groupby('id'):
                # print(j.head(1))
                for k, l in j.sort_values('estimate_date', ascending=True, na_position='last').groupby(['date1', 'date2']):
                        final_val = l['estimate_date'].head(1)
                        print(final_val)
                        if pd.notnull(final_val).all():
                                print(final_val)
                                return final_val
                        else:
                                print(np.nan)
                                return np.nan
                                
df['new_estimate_date'] = df['estimate_date'].apply(lambda x: estimatedCT(x))

But it is somehow giving me values of the last group in every record. Please suggest

RCN
  • 101
  • 10
  • what is your expected output? – Dejene T. Jul 27 '22 at 16:50
  • I have added it in the question. in the second snippet. – RCN Jul 27 '22 at 16:52
  • sorry. I have updated one small thing, I am expecting new estimated date column – RCN Jul 27 '22 at 17:05
  • This new estimated date will come from the row where there is max(modification date) in the group of short id and max modification date. And further a group of date1, date2 and max(modification date) , and among that group a minimum value in estimated date. – RCN Jul 27 '22 at 17:19

0 Answers0