I have a csv file with bid/ask prices of many bonds (using ISIN identifiers) for the past 1 yr. Using these historical prices, I'm trying to calculate the historical volatility for each bond. Although it should be typically an easy task, the issue is not all bonds have exactly same number of days of trading price data, while they're all in same column and not stacked. Hence if I need to calculate a rolling std deviation, I can't choose a standard rolling window of 252 days for 1 yr.
The data set has this format-
BusinessDate | ISIN | Bid | Ask |
---|---|---|---|
Date 1 | ISIN1 | P1 | P2 |
Date 2 | ISIN1 | P1 | P2 |
Date 252 | ISIN1 | P1 | P2 |
Date 1 | ISIN2 | P1 | P2 |
Date 2 | ISIN2 | P1 | P2 |
......
& so on.
My current code is as follows-
vol_df = pd.read_csv('hist_prices.csv')
vol_df['BusinessDate'] = pd.to_datetime(vol_df['BusinessDate'])
vol_df[Mid Price'] = vol_df[['Bid', 'Ask']].mean(axis = 1)
vol_df['log_return'] = vol_df.groupby('ISIN')['Mid Price'].apply(lambda x: np.log(x) - np.log(x.shift(1)))
vol_df['hist_vol'] = vol_df['log_return'].std() * np.sqrt(252)
The last line of code seems to be giving all NaN values in the column. This is most likely because the operation for calculating the std deviation is happening on the same row number and not for a list of numbers. I tried replacing the last line to use rolling_std-
vol_df.set_index('BusinessDate').groupby('ISIN').rolling(window = 1, freq = 'A').std()['log_return']
But this doesn't help either. It gives 2 numbers for each ISIN. I also tried to use pivot() to place the ISINs in columns and BusinessDate as index, and the Prices as "values". But it gives an error. Also I've close to 9,000 different ISINs and hence putting them in columns to calculate std() for each column may not be the best way. Any clues on how I can sort this out?