Historical Volatility from Prices of many different bonds in same column

Question

I have a csv file with bid/ask prices of many bonds (using ISIN identifiers) for the past 1 yr. Using these historical prices, I'm trying to calculate the historical volatility for each bond. Although it should be typically an easy task, the issue is not all bonds have exactly same number of days of trading price data, while they're all in same column and not stacked. Hence if I need to calculate a rolling std deviation, I can't choose a standard rolling window of 252 days for 1 yr.

The data set has this format-

BusinessDate	ISIN	Bid	Ask
Date 1	ISIN1	P1	P2
Date 2	ISIN1	P1	P2
Date 252	ISIN1	P1	P2
Date 1	ISIN2	P1	P2
Date 2	ISIN2	P1	P2

......

& so on.

My current code is as follows-

vol_df = pd.read_csv('hist_prices.csv')
vol_df['BusinessDate'] = pd.to_datetime(vol_df['BusinessDate'])
vol_df[Mid Price'] = vol_df[['Bid', 'Ask']].mean(axis = 1)
vol_df['log_return'] = vol_df.groupby('ISIN')['Mid Price'].apply(lambda x: np.log(x) - np.log(x.shift(1)))
vol_df['hist_vol'] = vol_df['log_return'].std() * np.sqrt(252)

The last line of code seems to be giving all NaN values in the column. This is most likely because the operation for calculating the std deviation is happening on the same row number and not for a list of numbers. I tried replacing the last line to use rolling_std-

vol_df.set_index('BusinessDate').groupby('ISIN').rolling(window = 1, freq = 'A').std()['log_return']

But this doesn't help either. It gives 2 numbers for each ISIN. I also tried to use pivot() to place the ISINs in columns and BusinessDate as index, and the Prices as "values". But it gives an error. Also I've close to 9,000 different ISINs and hence putting them in columns to calculate std() for each column may not be the best way. Any clues on how I can sort this out?

Brian Z · Answer 1 · 2021-08-23T18:50:25.663

ok this almost works now.

It does need some math per ISIN to figure out the rolling period, I just used 3 and 2 in my example, you probably need to count how many days of trading in the year or whatever and fix it at that per ISIN somehow.

And then you need to figure out how to merge the data back. The output actually has errors becuase its updating a copy, but that is kind of what I was looking for here. I am sure someone that knows more could fix it at this point. I can't get it working to do the merge.

toy_data={'BusinessDate': ['10/5/2020','10/6/2020','10/7/2020','10/8/2020','10/9/2020',
                           '10/12/2020','10/13/2020','10/14/2020','10/15/2020','10/16/2020',
                           '10/5/2020','10/6/2020','10/7/2020','10/8/2020'],
          'ISIN': [1,1,1,1,1, 1,1,1,1,1, 2,2,2,2],
          'Bid': [0.295,0.295,0.295,0.295,0.295,
                  0.296, 0.296,0.297,0.298,0.3,
                  2.5,2.6,2.71,2.8],
          'Ask': [0.301,0.305,0.306,0.307,0.308,
                  0.315,0.326,0.337,0.348,0.37,
                  2.8,2.7,2.77,2.82]}
#vol_df = pd.read_csv('hist_prices.csv')
vol_df = pd.DataFrame(toy_data)

vol_df['BusinessDate'] = pd.to_datetime(vol_df['BusinessDate'])
vol_df['Mid Price'] = vol_df[['Bid', 'Ask']].mean(axis = 1)
vol_df['log_return'] = vol_df.groupby('ISIN')['Mid Price'].apply(lambda x: np.log(x) - np.log(x.shift(1)))
vol_df.dropna(subset = ['log_return'], inplace=True)
# do some math here to calculate how many days you want to roll for an ISIN
# maybe count how many days over a 1 year period exist???
# not really sure how you'd miss days unless stuff just doesnt trade
# (but I don't need to understand it anyway)
rolling = {1: 3, 2: 2}
for isin in vol_df['ISIN'].unique():
    roll = rolling[isin]
    print(f'isin={isin}, roll={roll}')
    df_single = vol_df[vol_df['ISIN']==isin]
    df_single['rolling'] = df_single['log_return'].rolling(roll).std()
    # i can't get the right syntax to merge data back, but this shows it
    vol_df[isin, 'rolling'] = df_single['rolling']
    print(df_single)
print(vol_df)

which outputs (minus the warning errors):

isin=1, roll=3
  BusinessDate  ISIN    Bid    Ask  Mid Price  log_return   rolling
1   2020-10-06     1  0.295  0.305     0.3000    0.006689       NaN
2   2020-10-07     1  0.295  0.306     0.3005    0.001665       NaN
3   2020-10-08     1  0.295  0.307     0.3010    0.001663  0.002901
4   2020-10-09     1  0.295  0.308     0.3015    0.001660  0.000003
5   2020-10-12     1  0.296  0.315     0.3055    0.013180  0.006650
6   2020-10-13     1  0.296  0.326     0.3110    0.017843  0.008330
7   2020-10-14     1  0.297  0.337     0.3170    0.019109  0.003123
8   2020-10-15     1  0.298  0.348     0.3230    0.018751  0.000652
9   2020-10-16     1  0.300  0.370     0.3350    0.036478  0.010133
isin=2, roll=2
   BusinessDate  ISIN   Bid  ...    log_return  (1, rolling)   rolling
11   2020-10-06     2  2.60  ...  2.220446e-16           NaN       NaN
12   2020-10-07     2  2.71  ...  3.339828e-02           NaN  0.023616
13   2020-10-08     2  2.80  ...  2.522656e-02           NaN  0.005778

[3 rows x 8 columns]
   BusinessDate  ISIN    Bid  ...    log_return  (1, rolling)  (2, rolling)
1    2020-10-06     1  0.295  ...  6.688988e-03           NaN           NaN
2    2020-10-07     1  0.295  ...  1.665279e-03           NaN           NaN
3    2020-10-08     1  0.295  ...  1.662511e-03      0.002901           NaN
4    2020-10-09     1  0.295  ...  1.659751e-03      0.000003           NaN
5    2020-10-12     1  0.296  ...  1.317976e-02      0.006650           NaN
6    2020-10-13     1  0.296  ...  1.784313e-02      0.008330           NaN
7    2020-10-14     1  0.297  ...  1.910886e-02      0.003123           NaN
8    2020-10-15     1  0.298  ...  1.875055e-02      0.000652           NaN
9    2020-10-16     1  0.300  ...  3.647821e-02      0.010133           NaN
11   2020-10-06     2  2.600  ...  2.220446e-16           NaN           NaN
12   2020-10-07     2  2.710  ...  3.339828e-02           NaN      0.023616
13   2020-10-08     2  2.800  ...  2.522656e-02           NaN      0.005778

Thanks Brian. Actually I had tried with a very small subset and dropped the NaN rows and then I didn't get "NaN" values for the volatility. But the issue is as you can see in your example as well - the historical volatility for all ISINs is the same. It should be different for different ISINs (unless it's a huge coincidence that the returns over the period for different bonds were identical) — user16708102, Aug 20 '21 at 08:28
I did use groupby('ISIN') on the final volatility calculation so that I don't get the same number for all ISINs and it gives a "NaN" in all rows — user16708102, Aug 20 '21 at 08:49
it gives nan because there is exactly one nan for each group(first entry) which spoils it all. I will modify the answer to show what I mean. Feel free to add a better sample if you'd like me to run it with that for real output. — Brian Z, Aug 21 '21 at 23:39
No i've dropped NaNs from the log return column like I've earlier mentioned. It's not the reason. When you do a df['column name'].std(), it give you the std deviation for entire column. However, what is needed is for each entry in the identifier column. If the number of days for each identifier were the same, then a rolling_std() would have worked where you can mention window = 'number of days'. However, in my case each has different no. of days. My data is in my company laptop and I can't paste it here. I can add a sample but it won't be different from what you already have — user16708102, Aug 22 '21 at 13:13
I think I understand better now, hopefully this edit is more what you wanted. I can't get the merge working though I bet you can. — Brian Z, Aug 23 '21 at 18:51

score 0 · Answer 2 · answered Aug 23 '21 at 10:43

I was able to resolve this in a crude way-

vol_df_2 = vol_df.groupby('ISIN')['logret'].std()
vol_df_3 = vol_df_2.to_frame()
vol_df_3.rename(columns = {'logret':'daily_std}, inplace = True)

The first line above was returning a series and the std deviation column named as 'logret'. So the 2nd and 3rd line of code converts it into a dataframe and renames the daily std deviation as such. And finally the annual vol can be calculated using sqrt(252).

If anyone has a better way to do it in the same dataframe instead of creating a series, that'd be great.

Historical Volatility from Prices of many different bonds in same column

2 Answers2