I have a dataframe with one column, called maturity_dates. I also have a DateTimeIndex, called simulationdates.
I want to create a dataframe, where each cell is the simulationdates value minus the MAT_DATE. Additionally, I want the figure to be minimum 0, and represented in years.
The code below does the job, but it is very slow on large dataframes. Is there a quicker way without the for loop?
import numpy as np
import pandas as pd
import time
maturity_dates_raw = pd.DataFrame({'year': [2015, 2016, 2017, 2018, 2019, 2020, 2021]*40,
'month': [2, 3, 3, 3, 3, 3, 3]*40,
'day': [4, 5, 5, 5, 5, 5, 5]*40})
maturity_dates = pd.to_datetime(maturity_dates_raw)
date = pd.to_datetime("4th of July, 2015")
simulationdates = date + pd.to_timedelta(np.arange(5000), 'D')
t0 = time.time()
trade_m = pd.DataFrame(index=maturity_dates.index, columns=simulationdates)
mat_date = pd.to_datetime(maturity_dates)
dates = pd.DatetimeIndex.to_series(simulationdates)
for i in range(trade_m.shape[1]):
trade_m.iloc[:, i] = np.maximum(
(mat_date - dates[i]).astype('timedelta64[D]') / 365.0, 0.0)
t1 = time.time()
print('Time to maturity done in {} seconds.'.format(np.round(t1 - t0, 4)))
print(trade_m)
Time to maturity done in 0.018 seconds.
2015-07-04 2015-07-05 2015-07-06 2015-07-07 2015-07-08
0 0.000000 0.000000 0.000000 0.000000 0.000000
1 0.671233 0.668493 0.665753 0.663014 0.660274
2 1.671233 1.668493 1.665753 1.663014 1.660274
3 2.671233 2.668493 2.665753 2.663014 2.660274
4 3.671233 3.668493 3.665753 3.663014 3.660274
5 4.673973 4.671233 4.668493 4.665753 4.663014
6 5.673973 5.671233 5.668493 5.665753 5.663014