0

I have two data frames in csv files. First data described traffic incidents (df1) and second data has the traffic record data for each 15 minutes(df2). I want to merge between them based on the closest time. I used python pandas_merge_asof and I got the nearest match. but I want the 30 minutes records before and after the match from the traffic record data. And I want to join the closest incidents to the traffic data time. if the incidents occured 14:02:00, it will be mereged with the traffic date that recorded at 14:00:00

For example:

1- Incidents data

Date                detector_id              Inident_type
09/30/2015 8:00:00      1                      crash
09/30/2015 8:02:00      1                    congestion
04/22/2014 15:30:00     9                    congestion
04/22/2014 15:33:00     9                  Emergency vehicle 

    

2 - Traffic data

Date              detector_id               traffic_volume
09/30/2015 7:30:00      1                         55
09/30/2015 7:45:00      1                         45
09/30/2015 8:00:00      1                         60
09/30/2015 8:15:00      1                         200
09/30/2015 8:30:00      1                         70
04/22/2014 15:00:00     9                         15
04/22/2014 15:15:00     9                          7
04/22/2014 15:30:00     9                         50
04/22/2014 15:45:00     9                         11
04/22/2014 16:00:00     9                         7

2- the desired table

Date              detector_id               traffic_volume     Incident_type
09/30/2015 7:30:00      1                         55                  NA
09/30/2015 7:45:00      1                         45                  NA
09/30/2015 8:00:00      1                         60                Crash
09/30/2015 8:00:00      1                         60              congestion   
09/30/2015 8:15:00      1                         200                 NA
09/30/2015 8:30:00      1                         70                  NA
04/22/2014 15:00:00     9                         15                  NA
04/22/2014 15:15:00     9                          7                  NA
04/22/2014 15:30:00     9                         50              Congestion
04/22/2014 15:30:00     9                         50        Emergency vehicle   
04/22/2014 15:45:00     9                         11                  NA
04/22/2014 16:00:00     9                         7                   NA

The code that I used as follow

Merge = pd.merge_asof(df2, df1, left_index = True, right_index = True, allow_exact_maches = False,
on='Date', by='detector_id', direction='nearest')

but it gave me this table.

Date              detector_id               traffic_volume     Incident_type
09/30/2015 8:00:00      1                         60                Crash
04/22/2014 15:30:00     9                         50              Congestion

and I want to know the situation after and before the incidents occur.

Any Idea? Thank you.

*If I made mistake by asking like this way, please let me know.

1 Answers1

0

For anyone has the same problem and want to do merge by using pandas.merge_asof, you have to use the Tolerance function. This function helps you adjust the time different between the two datasets.

But you may face two problems related to Timedelta and sorting index. so the solution of Timedelta is converting the time to datetime as follow:

df1.Date = pd.to_datetime(df1.Date)
df2.Date = pd.to_datetime(df2.Date)

and the sorting index you need apply sort in your main code as follow:

x = pd.merge_asof(df1.sort_values('Date'), #sort_values fix the error"left Key must be sorted"
                  df2.sort_values('Date'), 
                  on = 'Date', 
                  by = 'Detector_id',
                  direction = 'backward', 
                  tolerance =pd.Timedelta('45 min'))

The direction could be nearest which in my case will select all the records accord before and after the match records within 45 minutes.

The direction could be backward will merge all records within 45 minutes after the exact or nearest match and Forward will select all the records within 45 minutes before the exact or nearest match.

Thank you and hopefully this will help anyone in future.