6

Hi I need to align some time series data with nearest timestamps, so I think pandas.merge_asof could be a good candidate. However, it does not have an option to set how='outer' like in the standard merge method.

An example can be:

df1:

                                   Value1
Time
2020-07-17 14:25:03.535906075      108
2020-07-17 14:25:05.457247019      110
2020-07-17 14:25:07.467777014      126

df2:

                                  Value2
Time
2020-07-17 14:25:03.535018921     222
2020-07-17 14:25:04.545104980     150
2020-07-17 14:25:07.476825953      60

Then for example, do this merge_asof:

pd.merge_asof(df1, df2, left_index=True, right_index=True, direction='nearest', tolerance=pd.Timedelta('0.3s'))

The results will be:

                               Value1  Value2
Time
2020-07-17 14:25:03.535906075     108   222.0
2020-07-17 14:25:05.457247019     110     NaN
2020-07-17 14:25:07.467777014     126    60.0

But what I want is:

                               Value1  Value2
Time
2020-07-17 14:25:03.535906075     108   222.0
2020-07-17 14:25:04.545104980     NaN   150.0   <---- this is the difference
2020-07-17 14:25:05.457247019     110     NaN
2020-07-17 14:25:07.467777014     126    60.0

basically like a full outer join.

Any suggestion? Thanks in advance.

EDIT:

So this is the case with 2 dataframes. What if, say for example, there are 10 dataframes (i.e. df1, df2, ..., df10) need to do this "nearest" merging, what will be a good way to do?

circle999
  • 63
  • 1
  • 5

2 Answers2

3
  1. Unfortunately, there is no how parameter in pd.merge_asof like there is with pd.merge, otherwise you could simply pass how='outer'.
  2. As a workaround, you can append the unmatched values from the other dataframe manually
  3. Then, sort the index with .sort_index()

df3 = pd.merge_asof(df1, df2, left_index=True, right_index=True, direction='nearest', tolerance=pd.Timedelta('0.3s'))
df4 = pd.merge_asof(df2, df1, left_index=True, right_index=True, direction='nearest', tolerance=pd.Timedelta('0.3s'))
df5 = df3.append(df4[df4['Value1'].isnull()]).sort_index()
df5
Out[1]: 
                               Value1  Value2
Time                                         
2020-07-17 14:25:03.535906075   108.0   222.0
2020-07-17 14:25:04.545104980     NaN   150.0
2020-07-17 14:25:05.457247019   110.0     NaN
2020-07-17 14:25:07.467777014   126.0    60.0
David Erickson
  • 16,433
  • 2
  • 19
  • 35
  • Hi thanks! What do you think could be a good way to merge more than 2 dataframes? Please see the updated question. – circle999 Nov 07 '20 at 02:10
  • @circle999 That requires a different solution. Can you create a new question and reference back to this question? YOu can copy and paste all of your data, and add multiple example dataframes (like 3 instead of 2). It's generally frowned upon to update the question like this. – David Erickson Nov 07 '20 at 02:12
2

This seems simple enough yet no direct solution. There's an option to merge again to bring in the missing rows:

# enumerate the rows of `df2` to later identify which are missing
df2 = df2.reset_index().assign(idx=np.arange(df2.shape[0]))
(pd.merge_asof(df1.reset_index(), 
               df2[['Time','idx']], 
              on='Time',
              direction='nearest', 
              tolerance=pd.Timedelta('0.3s'))
  .merge(df2, on='idx', how='outer')                        # merge back on row number
  .assign(Time=lambda x: x['Time_x'].fillna(x['Time_y']))   # fill the time
  .set_index(['Time'])                                      # set index back
  .drop(['Time_x','Time_y','idx'], axis=1)
  .sort_index()
)

                               Value1  Value2
Time                                         
2020-07-17 14:25:03.535906075   108.0   222.0
2020-07-17 14:25:04.545104980     NaN   150.0
2020-07-17 14:25:05.457247019   110.0     NaN
2020-07-17 14:25:07.467777014   126.0    60.0
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
  • Hi thanks! What do you think could be a good way to merge more than 2 dataframes? Please see the updated question. – circle999 Nov 07 '20 at 02:10