Fill multiple rows in between pandas dataframe rows on condition

Question

I have a dataset like below:

pd.DataFrame({'Date':['2019-01-01','2019-01-03','2019-01-01','2019-01-04','2019-01-01','2019-01-03'],'Name':['A','A','B','B','C','C'],'Open Price':[100,200,300,400,500,600],'Close Price':[200,300,400,500,600,700]})

Now we can see that we have few day entries missing in this table. i.e 2019-01-02 for A, and 2019-01-02, 2019-01-03 for B and 2019-01-02 for C.

What I'm looking to do is add dummy rows in the dataframe for these dates,

And close price column as the same of the next open price entry for next day. And I don't care the open price, it could be either nan or 0

Expected output

pd.DataFrame({'Date':['2019-01-01','2019-01-02','2019-01-03','2019-01-01','2019-01-02','2019-01-03','2019-01-04','2019-01-01','2019-01-02','2019-01-03'],'Name':['A','A','A','B','B','B','B','C','C','C'],'Open Price':[50,'nan',150,250,'nan','nan',350,450,'nan',550],'Close Price':[200,150,300,400,350,350,500,600,550,700]})

Any help would be appreciated !

why the prices changes in the expected output ? What's the logic behind ? — Chiheb Nexus, Sep 18 '19 at 21:40
i.e the open price for A on 2019-01-03 is 150, so we expected the close price for A on 2019-01-02 is 150. — Qianyu Zhang, Sep 18 '19 at 22:30

score 0 · Answer 1 · answered Sep 19 '19 at 02:06

Your logic is fuzzy for how the prices should be interpolated, but to get you started, consider this, remembering to get date into a datetime dtype:

df['Date'] = pd.to_datetime(df['Date'])
df = (df.groupby('Name')
        .resample('D', on='Date')
        .mean()
        .swaplevel()
        .interpolate()
)

print(df)
                 Open Price  Close Price
Date       Name                         
2019-01-01 A     100.000000   200.000000
2019-01-02 A     150.000000   250.000000   
2019-01-03 A     200.000000   300.000000
2019-01-01 B     300.000000   400.000000
2019-01-02 B     333.333333   433.333333
2019-01-03 B     366.666667   466.666667
2019-01-04 B     400.000000   500.000000  
2019-01-01 C     500.000000   600.000000
2019-01-02 C     550.000000   650.000000
2019-01-03 C     600.000000   700.000000

Fill multiple rows in between pandas dataframe rows on condition

1 Answers1