1

I have two pandas series with DateTimeIndex. I'd like to join these two series such that the resulting DataFrame uses the index of the first series and "matches" the values from the second series accordingly (using a linear interpolation in the second series).

First Series:

2020-03-01    1
2020-03-03    2
2020-03-05    3
2020-03-07    4

Second Series:

2020-03-01    20
2020-03-02    22
2020-03-05    25
2020-03-06    35
2020-03-07    36
2020-03-08    45

Desired Output:

2020-03-01    1    20
2020-03-03    2    23
2020-03-05    3    25
2020-03-07    4    36

Code for generating the input data:

import pandas as pd
import datetime as dt

s1 = pd.Series([1, 2, 3, 4])
s1.index = pd.to_datetime([dt.date(2020, 3, 1), dt.date(2020, 3, 3), dt.date(2020, 3, 5), dt.date(2020, 3, 7)])

s2 = pd.Series([20, 22, 25, 35, 36, 45])
s2.index = pd.to_datetime([dt.date(2020, 3, 1), dt.date(2020, 3, 2), dt.date(2020, 3, 5), dt.date(2020, 3, 6), dt.date(2020, 3, 7), dt.date(2020, 3, 8)])

maz
  • 143
  • 8

3 Answers3

4

Use concat with inner join:

df = pd.concat([s1, s2], axis=1, keys=('s1','s2'), join='inner')
print (df)
            s1  s2
2020-03-01   1  20
2020-03-05   3  25
2020-03-07   4  36

Solution with interpolate of s2 Series and then removed rows with missing values:

df = (pd.concat([s1, s2], axis=1, keys=('s1','s2'))
        .assign(s2 = lambda x: x.s2.interpolate('index'))
        .dropna())
print (df)
             s1    s2
2020-03-01  1.0  20.0
2020-03-03  2.0  23.0
2020-03-05  3.0  25.0
2020-03-07  4.0  36.0
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
0

Construct combined dataframe

# there are many ways to construct a dataframe from series, this uses the constructor:
df = pd.DataFrame({'s1': s1, 's2': s2})
             s1    s2
2020-03-01  1.0  20.0
2020-03-02  NaN  22.0
2020-03-03  2.0   NaN
2020-03-05  3.0  25.0
2020-03-06  NaN  35.0
2020-03-07  4.0  36.0
2020-03-08  NaN  45.0

Interpolate

df = df.interpolate()
             s1    s2
2020-03-01  1.0  20.0
2020-03-02  1.5  22.0
2020-03-03  2.0  23.5
2020-03-05  3.0  25.0
2020-03-06  3.5  35.0
2020-03-07  4.0  36.0
2020-03-08  4.0  45.0

Restrict rows

# Only keep the rows that were in s1's index. 
# Several ways to do this, but this example uses .filter
df = df.filter(s1.index, axis=0)
             s1    s2
2020-03-01  1.0  20.0
2020-03-03  2.0  23.5
2020-03-05  3.0  25.0
2020-03-07  4.0  36.0

Convert numbers back to int64

df = df.astype('int64')
        s1  s2
2020-03-01   1  20
2020-03-03   2  23
2020-03-05   3  25
2020-03-07   4  36

One-liner:

df = pd.DataFrame({'s1': s1, 's2': s2}).interpolate().filter(s1.index, axis=0).astype('int64')

Documentation links:

codeape
  • 97,830
  • 24
  • 159
  • 188
  • Thanks for the example @codeape, but it does not solve the problem I have. If I take your example on 2020-03-01 is 21 (s1+s2=21), 2023-03-02 is 23 (s1+s2=22+1) and 2020-03-03 is 24 (s1+s2=24) and not 20 and 23 as resulting in your example. – Kodsama Aug 14 '23 at 13:49
0

Thanks @codeape for the example, even if not doing what I needed I changed it to fix the problem. Here is my solution:

# Create a dataframe with two columns
df = pd.DataFrame({'room1': room1, 'room2': room2})
print(df.shape, df)

# Fill the NaN values with the previous from each column,
# then, add 0 to the non filled one (ones there is no previous value for)
# Finish by converting back to integers
df = df.fillna(method='ffill').fillna(0).astype('int64')
print(df.shape, df)

# To get the final result, add the two columns
result = df['room1'] + df['room2']
Kodsama
  • 131
  • 1
  • 12