Suppose that you have two data frames which can be created using code below:
df1 = pd.DataFrame(data={'start_date': ['2021-07-02', '2021-07-09',
'2021-07-16', '2021-07-23',
'2021-07-30'],
'end_date': ['2021-07-09', '2021-07-16',
'2021-07-23', '2021-07-30',
'2021-08-06']})
price_df = pd.DataFrame(data={'date': ['2021-07-01', '2021-07-03',
'2021-07-08', '2021-07-10',
'2021-07-16', '2021-07-22',
'2021-07-30', '2021-07-31',
'2021-08-06'],
'price': [47, 62, 107, 171, 30, 154, 42,
143, 131]})
adding conversion to datetime using code below
df1['start_date'] = pd.to_datetime(df1['start_date'])
df1['end_date'] = pd.to_datetime(df1['end_date'])
price_df['date'] = pd.to_datetime(price_df['date'])
Using df1['start_date'].isin(price_df['date'])
I can check if dates in column start_date
of data frame df1
exists in data frame price_df
. In case that is not true I want to replace the start_date
with last available date from price_df
. Similar process needs to be performed on column end_date
of data frame df1
. Finally prices can be extracted using a join.
Expected Output
+------------+------------+-------------+-----------+
| start_date | end_date | start_price | end_price |
+------------+------------+-------------+-----------+
| 01/07/2021 | 08/07/2021 | 47 | 107 |
| 08/07/2021 | 16/07/2021 | 107 | 30 |
| 16/07/2021 | 22/07/2021 | 30 | 154 |
| 22/07/2021 | 30/07/2021 | 154 | 42 |
| 30/07/2021 | 06/08/2021 | 42 | 131 |
+------------+------------+-------------+-----------+
I attempted this code:
df2 = pd.DataFrame()
df2['start_date'] = np.where(df1['start_date']
.isin(price_df['date']),
df1['start_date'],
price_df[
price_df.date < df1['start_date']][
'date'].max())
I am getting the following error:
ValueError: Can only compare identically-labeled Series objects