Linear Interpolation using numpy.interp

Question

I have a 1 dimensional array A of floats that is mostly good but a few of the values are missing. Missing data is replace with nan(not a number). I have to replace the missing values in the array by linear interpolation from the nearby good values. So, for example:

F7(np.array([10.,20.,nan,40.,50.,nan,30.]))

should return

np.array([10.,20.,30.,40.,50.,40.,30.]).

What's the best of way of doing this using Python?

Any help would be much appreciated

Thanks

Do you really mean linear interpolation? Or do you actually mean average? -- I also assume that the first and last values are guaranteed to not be NaN? — mgilson, Oct 31 '12 at 20:29
It was just an average on the example. The linear interpolation should really just find the missing values in a linear equation. And yeah, the first and last values aren't NaN. — , Oct 31 '12 at 20:43

score 16 · Answer 1 · edited Oct 26 '13 at 20:37

16

You could use scipy.interpolate.interp1d:

>>> from scipy.interpolate import interp1d
>>> import numpy as np
>>> x = np.array([10., 20., np.nan, 40., 50., np.nan, 30.])
>>> not_nan = np.logical_not(np.isnan(x))
>>> indices = np.arange(len(x))
>>> interp = interp1d(indices[not_nan], x[not_nan])
>>> interp(indices)
array([ 10.,  20.,  30.,  40.,  50.,  40.,  30.])

EDIT: it took me a while to figure out how np.interp works, but that can do the job as well:

>>> np.interp(indices, indices[not_nan], x[not_nan])
array([ 10.,  20.,  30.,  40.,  50.,  40.,  30.])

edited Oct 26 '13 at 20:37

ecoe

4,994
7
54
72

answered Oct 31 '12 at 20:34

Fred Foo

355,277
75
744
836

1

I think I would use `len(x)` rather than `*x.shape`. It seems slightly more explicit since we're only doing 1D anyway (and this doesn't generalize to more dimensions) -- but +1 for a working interpolate solution. – mgilson Oct 31 '12 at 20:36
Rather than generating `np.arange(len(x))` twice, why not just do it once and store the result? Also, I don't think you need `scipy` for this. `np.interp` seems like it would do the same thing in this scenario – mgilson Oct 31 '12 at 20:39
@mgilson: you were right three times. Thanks, updated the answer. – Fred Foo Oct 31 '12 at 20:43

root · Answer 2 · 2012-10-31T20:45:17.983

9

I would go with pandas. A minimalistic approach with a oneliner:

from pandas import *
a=np.array([10.,20.,nan,40.,50.,nan,30.])
Series(a).interpolate()   

Out[219]:
0    10
1    20
2    30
3    40
4    50
5    40
6    30

Or if you want to keep it as an array:

Series(a).interpolate().values

Out[221]:
array([ 10.,  20.,  30.,  40.,  50.,  40.,  30.])

edited Oct 31 '12 at 20:45

answered Oct 31 '12 at 20:39

root

76,608
25
108
120

@larsmans -- i was just going to suggest .values , that also returns an array :) – root Oct 31 '12 at 20:46
Saw it, deleted my comment. Pandas is still on the "libraries to learn" list :) – Fred Foo Oct 31 '12 at 22:44

score 0 · Answer 3 · answered Aug 24 '18 at 20:22

To not create new Series object or new items in Series every time you want to interpolate data use RedBlackPy. See code example below:

import redblackpy as rb

# we do not include missing data
index = [0,1,3,4,6]
data = [10,20,40,50,30]
# create Series object
series = rb.Series(index=index, values=data, dtype='float32',
                   interpolate='linear')

# Now you have access at any key using linear interpolation
# Interpolation does not creates new items in Series
print(series[2]) # prints 30
print(series[5]) # prints 40
# print Series and see that keys 2 and 5 do not exist in series
print(series)

The last output is following:

Series object Untitled
0: 10.0
1: 20.0
3: 40.0
4: 50.0
6: 30.0

Linear Interpolation using numpy.interp

3 Answers3