31

I have a DataFrame, say a volatility surface with index as time and column as strike. How do I do two dimensional interpolation? I can reindex but how do i deal with NaN? I know we can fillna(method='pad') but it is not even linear interpolation. Is there a way we can plug in our own method to do interpolation?

piRSquared
  • 285,575
  • 57
  • 475
  • 624
archlight
  • 687
  • 1
  • 6
  • 12

2 Answers2

40

You can use DataFrame.interpolate to get a linear interpolation.

In : df = pandas.DataFrame(numpy.random.randn(5,3), index=['a','c','d','e','g'])

In : df
Out:
          0         1         2
a -1.987879 -2.028572  0.024493
c  2.092605 -1.429537  0.204811
d  0.767215  1.077814  0.565666
e -1.027733  1.330702 -0.490780
g -1.632493  0.938456  0.492695

In : df2 = df.reindex(['a','b','c','d','e','f','g'])

In : df2
Out:
          0         1         2
a -1.987879 -2.028572  0.024493
b       NaN       NaN       NaN
c  2.092605 -1.429537  0.204811
d  0.767215  1.077814  0.565666
e -1.027733  1.330702 -0.490780
f       NaN       NaN       NaN
g -1.632493  0.938456  0.492695

In : df2.interpolate()
Out:
          0         1         2
a -1.987879 -2.028572  0.024493
b  0.052363 -1.729055  0.114652
c  2.092605 -1.429537  0.204811
d  0.767215  1.077814  0.565666
e -1.027733  1.330702 -0.490780
f -1.330113  1.134579  0.000958
g -1.632493  0.938456  0.492695

For anything more complex, you need to roll-out your own function that will deal with a Series object and fill NaN values as you like and return another Series object.

ayhan
  • 70,170
  • 20
  • 182
  • 203
Avaris
  • 35,883
  • 7
  • 81
  • 72
  • 15
    It would be a good idea to incorporate this as an option in fillna. – DanB Sep 11 '12 at 04:05
  • 1
    What if there is another dimension (or category) to hold constant (separate) in the interpolation step? ie, how can I combine your wonderful solution with a groupby? Right now, if there are repeated values of the index (e.g. they are identical across the different categories I wish to group by), the reindex() step fails, claiming "Reindexing only valid with uniquely valued Index objects". (Maybe this should be a new question?) – CPBL May 26 '13 at 02:46
  • That's a great and somewhat obscure answer. It would be nice to have a convenience function for this where you can pick the axes to interpolate over – Bicubic Jun 04 '13 at 05:56
  • 1
    Could also use DataFrame's interpolate method? `df2.interpolate()` because `df2.interpolate() == df2.apply(pandas.Series.interpolate)` (at least for me, `pandas.__version__ == 0.14`) – Nate Anderson Nov 30 '14 at 19:41
7

Old thread but thought I would share my solution with 2d extrapolation/interpolation, respecting index values, which also works on demand. Code ended up a bit weird so let me know if there is a better solution:

import pandas
from   numpy import nan
import numpy

dataGrid = pandas.DataFrame({1: {1: 1, 3: 2},
                             2: {1: 3, 3: 4}})


def getExtrapolatedInterpolatedValue(x, y):
    global dataGrid
    if x not in dataGrid.index:
        dataGrid.ix[x] = nan
        dataGrid = dataGrid.sort()
        dataGrid = dataGrid.interpolate(method='index', axis=0).ffill(axis=0).bfill(axis=0)

    if y not in dataGrid.columns.values:
        dataGrid = dataGrid.reindex(columns=numpy.append(dataGrid.columns.values, y))
        dataGrid = dataGrid.sort_index(axis=1)
        dataGrid = dataGrid.interpolate(method='index', axis=1).ffill(axis=1).bfill(axis=1)

    return dataGrid[y][x]


print getExtrapolatedInterpolatedValue(2, 1.4)
>>2.3
Emil Stenström
  • 13,329
  • 8
  • 53
  • 75
Nick Holden
  • 3,639
  • 3
  • 22
  • 12