After long time of interested reading, it´s now time for my first question here.
I try to create a 3D volatility surface out of single data points. The basic structure looks like this:
Strike | 42 | 77 | 133 | 224 | 315 | 406 | 595 |
---|---|---|---|---|---|---|---|
7.2 | NaN | NaN | NaN | NaN | NaN | NaN | 54 |
13.0 | NaN | NaN | 66 | 46 | 60 | NaN | NaN |
14.0 | NaN | NaN | 61 | 60 | 58 | 54 | 51 |
15.0 | NaN | 65 | 58 | 57 | 56 | NaN | NaN |
15.5 | 74 | 62 | NaN | NaN | NaN | NaN | NaN |
18.0 | 62 | 55 | 53 | 51 | 50 | 46 | 45 |
18.5 | 59 | 53 | NaN | NaN | NaN | NaN | NaN |
19.0 | 57 | 52 | 51 | 50 | 48 | NaN | NaN |
19.5 | 56 | 51 | NaN | NaN | NaN | NaN | NaN |
20.0 | 54 | 50 | 49 | 48 | 47 | 45 | 42 |
21.0 | 51 | 48 | NaN | NaN | NaN | NaN | NaN |
22.0 | 49 | 46 | 46 | 45 | 43 | NaN | NaN |
27.0 | 46 | 43 | NaN | NaN | NaN | NaN | NaN |
28.0 | 46 | 43 | 42 | 41 | 39 | 38 | 37 |
44.0 | NaN | NaN | NaN | NaN | NaN | NaN | 33 |
48.0 | NaN | NaN | NaN | NaN | NaN | NaN | 34 |
This is a table with random data, but pretty close to my data (my values are all float numbers), to show the basic traits. The column names is the time to maturity in days, the index is the strike price, the corresponding values are the volatilities. I identified two problems (for me, maybe they are no problems):
- Randomly distributed NaN values in between the data.
- Uneven lenght of index and columns.
In the end I want to be in the situation, that I can assign a volatility value to each random pair of days to maturity and strike. In a 2 dimensional space this was possible by using interp1d (for each maturity band). But with this I have a function for each maturity band. If a value is in between two maturity bands, I don´t know how to calculate it. So, I tried the following on the 3D calculation:
Griddata:
df_test = df2_0.iloc[9:16,0:7]
X=df_test.columns.values
Y=df_test.index.values
Z=df_test.values
test = np.ma.masked_invalid(df_test)
XX,YY = np.meshgrid(np.linspace(min(X),max(X),10),np.linspace(min(Y),max(Y),10))
x1=XX[~test.mask]
y1=YY[~test.mask]
df_test1=test[~test.mask]
d = interpolate.griddata((x1,y1), df_test1.ravel(),(XX,YY),method="cubic")
Griddata was one of the most promising attempts because of the easy set up. It worked when I used iloc to use only extracts of the dataframe above. Without iloc I get the error message: "IndexError: boolean index did not match indexed array along dimension 1; dimension is 34 but corresponding boolean dimension is 7".
Another attempt was to leave out the masking, but then it gives me also the error message "ValueError: invalid number of dimensions in xi":
m= ((int(max(Y))-int(min(Y)))*10)-1
XX,YY = np.meshgrid(np.linspace(min(X),max(X),(max(X)-min(X)+1)),np.linspace(min(Y),max(Y),m))
ZZ = griddata(np.array([X,Y]).T,np.array(Z),(XX,YY), method='linear')
Is there a possibility to use griddata on my dataset? And does it also compute values which are not present in the dataframe above (for example Strike "30" with maturity "100")?interesting
Interp2d and RBFInterpolator:
I also took a look into these commands and they were especially intresting because of the possibility to extrapolate the data. I tried this:
X=df_test.columns.values
Y=df_test.index.values
Z=df_test.values
XX, YY = np.meshgrid(X,Y)
rbf = interp.RBFInterpolator(np.array([X,Y]).T,Z,
smoothing=0, kernel='cubic')
There was no error, but when I try to put in numbers into rbf, it only gives me "nan" values. Have i used this one right? Or should I give interp2d a try?
Thank you very much in advance.