0

I am looking for a way to make the following code work:

import pandas

path = 'data_prices.csv'
data =  pandas.read_csv(path, sep=';')
data = data.sort_values(by=['TICKER', 'DATE'], ascending=[True, False])
data.columns

I have a 2 dimensional array with three columns, the data looks like this:

DATE;TICKER;PRICE
20151231;A UN Equity;41.81
20151230;A UN Equity;42.17
20151229;A UN Equity;42.36
20151228;A UN Equity;41.78
20151224;A UN Equity;42.14
20151223;A UN Equity;41.77
20151222;A UN Equity;41.22
20151221;A UN Equity;40.83
20151218;A UN Equity;40.1
20091120;PCG UN Equity;42.1
20091119;PCG UN Equity;41.53
20091118;PCG UN Equity;41.86
20091117;PCG UN Equity;42.23
20091116;PCG UN Equity;42.6
20091113;PCG UN Equity;41.93
20091112;PCG UN Equity;41.6
20091111;PCG UN Equity;42.01

Now, I want to calculate the x-day realized volatility where x came from an input field and x should not be bigger than the number of observations.

The steps that need to be taken:

  • Calculate the log return for each line
  • Take those returns and run the standard deviation on top of it
  • Multiply by the square root of 255 to normalize for per annum volatility
Spurious
  • 1,903
  • 5
  • 27
  • 53
  • 1
    Please provide the error message you get as you said 'It already crashes there'. – albert Aug 08 '16 at 11:59
  • It looks like need `data.reset_index(inplace=True)`, because first column is index. – jezrael Aug 08 '16 at 11:59
  • added the error message. Reset index did not mitigate the error. Maybe I've put it in the wrong place? I put it right before the sort. – Spurious Aug 08 '16 at 12:05
  • Can you run `print(data.columns)` before the sort (right after reading the file) – ayhan Aug 08 '16 at 12:06
  • Yes, that works and the output is: `Index(['DATE', 'TICKER', 'PRICE'], dtype='object')` – Spurious Aug 08 '16 at 12:09
  • With this file and the code you provided I cannot reproduce your problem. It works fine. Can you try to sort again? – ayhan Aug 08 '16 at 12:11
  • Ok, thank you, it must be in the `.csv` I have because if I use the shorter version I provided above, it works fine. – Spurious Aug 08 '16 at 12:13
  • Now, I only need to calculate the returns and the volatility. I edited my original post to only reflect the current challenges. – Spurious Aug 08 '16 at 12:18
  • So... what have you tried? – Wayne Werner Aug 08 '16 at 12:30
  • http://stackoverflow.com/questions/31287552/logarithmic-returns-in-pandas-dataframe What is being done here but `'DataFrame' object has no attribute 'price'` which is not true, I do have a column called price. – Spurious Aug 08 '16 at 12:32
  • ok I didn't put it in capital letters which caused the error. Now I only need to get the x-day volatility – Spurious Aug 08 '16 at 12:36

1 Answers1

2

Apologies, it's not fully clear on the sort of output you're hoping for so I've assumed you want to enter a ticker and a period (x) and see the current volatility number. Below I have also made use of numpy, in case you don't have that library.

Essentially I've created a DataFrame of all the original data and then a new DF filtered for the given ticker (where the user only needs to type in the 'A' or 'PCG' part, because 'UN Equity' is assumed constant). In this new DF, after checking that your period (x) input is not too high, it will output the most recent annualised volatility value.

import numpy as np
import pandas as pd

data = pd.read_csv('dump.csv', sep=';')
data = data.sort_values(by=['TICKER','DATE'],ascending=[True,True])


def vol(ticker, x):
    df = pd.DataFrame(data)
    df['pct_chg'] = df.PRICE.pct_change()
    df['log_rtn'] = np.log(1 + df.pct_chg)

    df_filtered = df[df.TICKER==ticker+' UN Equity']

    max_x = len(df_filtered) - 1
    if x > max_x:
        print('Too many periods. Reduce x')

    df_filtered['vol'] = pd.rolling_std(df_filtered.log_rtn, window=x) * (255**0.5)

    print(df_filtered.vol.iloc[-1])

As an example, with an input of vol('PCG',6) the output is 0.187855386042

Probably not the most elegant and apologies if I've misunderstood your request.

sk877
  • 406
  • 2
  • 12
  • I still get the error and I think it is due to the fact that the dataset is too large. I will try to make it work and see if your solution will solve it as well. – Spurious Aug 08 '16 at 18:34
  • In the meantime, I fixed my issue and your code is helpful. The problem here is that I cannot enter a date. – Spurious Aug 09 '16 at 09:43