Join two Pandas Series with different DateTimeIndex

Question

I have two pandas series with DateTimeIndex. I'd like to join these two series such that the resulting DataFrame uses the index of the first series and "matches" the values from the second series accordingly (using a linear interpolation in the second series).

First Series:

2020-03-01    1
2020-03-03    2
2020-03-05    3
2020-03-07    4

Second Series:

2020-03-01    20
2020-03-02    22
2020-03-05    25
2020-03-06    35
2020-03-07    36
2020-03-08    45

Desired Output:

2020-03-01    1    20
2020-03-03    2    23
2020-03-05    3    25
2020-03-07    4    36

Code for generating the input data:

import pandas as pd
import datetime as dt

s1 = pd.Series([1, 2, 3, 4])
s1.index = pd.to_datetime([dt.date(2020, 3, 1), dt.date(2020, 3, 3), dt.date(2020, 3, 5), dt.date(2020, 3, 7)])

s2 = pd.Series([20, 22, 25, 35, 36, 45])
s2.index = pd.to_datetime([dt.date(2020, 3, 1), dt.date(2020, 3, 2), dt.date(2020, 3, 5), dt.date(2020, 3, 6), dt.date(2020, 3, 7), dt.date(2020, 3, 8)])

jezrael · Accepted Answer · 2020-05-12T10:37:06.850

4

Use concat with inner join:

df = pd.concat([s1, s2], axis=1, keys=('s1','s2'), join='inner')
print (df)
            s1  s2
2020-03-01   1  20
2020-03-05   3  25
2020-03-07   4  36

Solution with interpolate of s2 Series and then removed rows with missing values:

df = (pd.concat([s1, s2], axis=1, keys=('s1','s2'))
        .assign(s2 = lambda x: x.s2.interpolate('index'))
        .dropna())
print (df)
             s1    s2
2020-03-01  1.0  20.0
2020-03-03  2.0  23.0
2020-03-05  3.0  25.0
2020-03-07  4.0  36.0

edited May 12 '20 at 10:37

answered May 12 '20 at 09:49

jezrael

822,522
95
1,334
1,252

1

i dont think it matches the output of OP – sammywemmy May 12 '20 at 09:56
@sammywemmy - yop, because input data – jezrael May 12 '20 at 09:57
true dat. my bad – sammywemmy May 12 '20 at 09:57
@sammywemmy - Now not 100% sure, so added new solution (aslo a bit diffrerent) – jezrael May 12 '20 at 10:00
Thanks @jezrael! I just found out that I need to use `interpolate('index')` to get the desired output. Can you edit your post accordingly? Then I'll accept this as the correct answer. – maz May 12 '20 at 10:34

codeape · Answer 2 · 2020-05-12T10:24:22.373

Construct combined dataframe

# there are many ways to construct a dataframe from series, this uses the constructor:
df = pd.DataFrame({'s1': s1, 's2': s2})
             s1    s2
2020-03-01  1.0  20.0
2020-03-02  NaN  22.0
2020-03-03  2.0   NaN
2020-03-05  3.0  25.0
2020-03-06  NaN  35.0
2020-03-07  4.0  36.0
2020-03-08  NaN  45.0

Interpolate

df = df.interpolate()
             s1    s2
2020-03-01  1.0  20.0
2020-03-02  1.5  22.0
2020-03-03  2.0  23.5
2020-03-05  3.0  25.0
2020-03-06  3.5  35.0
2020-03-07  4.0  36.0
2020-03-08  4.0  45.0

Restrict rows

# Only keep the rows that were in s1's index. 
# Several ways to do this, but this example uses .filter
df = df.filter(s1.index, axis=0)
             s1    s2
2020-03-01  1.0  20.0
2020-03-03  2.0  23.5
2020-03-05  3.0  25.0
2020-03-07  4.0  36.0

Convert numbers back to int64

df = df.astype('int64')
        s1  s2
2020-03-01   1  20
2020-03-03   2  23
2020-03-05   3  25
2020-03-07   4  36

One-liner:

df = pd.DataFrame({'s1': s1, 's2': s2}).interpolate().filter(s1.index, axis=0).astype('int64')

Documentation links:

Thanks for the example @codeape, but it does not solve the problem I have. If I take your example on 2020-03-01 is 21 (s1+s2=21), 2023-03-02 is 23 (s1+s2=22+1) and 2020-03-03 is 24 (s1+s2=24) and not 20 and 23 as resulting in your example. — Kodsama, Aug 14 '23 at 13:49

score 0 · Answer 3 · answered Aug 14 '23 at 14:09

Thanks @codeape for the example, even if not doing what I needed I changed it to fix the problem. Here is my solution:

# Create a dataframe with two columns
df = pd.DataFrame({'room1': room1, 'room2': room2})
print(df.shape, df)

# Fill the NaN values with the previous from each column,
# then, add 0 to the non filled one (ones there is no previous value for)
# Finish by converting back to integers
df = df.fillna(method='ffill').fillna(0).astype('int64')
print(df.shape, df)

# To get the final result, add the two columns
result = df['room1'] + df['room2']

Join two Pandas Series with different DateTimeIndex

3 Answers3

Construct combined dataframe

Interpolate

Restrict rows

Convert numbers back to int64