Modified linear interpolation with missing data

Question

Imagine a set of data with given x-values (as a column vector) and several y-values combined in a matrix (row vector of column vectors). Some of the values in the matrix are not available:

%% Create the test data
N = 1e2; % Number of x-values

x = 2*sort(rand(N, 1))-1;
Y = [x.^2, x.^3, x.^4, x.^5, x.^6]; % Example values
Y(50:80, 4) = NaN(31, 1); % Some values are not avaiable

Now i have a column vector of new x-values for interpolation.

K = 1e2; % Number of interplolation values
x_i = rand(K, 1);

My goal is to find a fast way to interpolate all y-values for the given x_i values. If there are NaN values in the y-values, I want to use the y-value which is before the missing data. In the example case this would be the data in Y(49, :).

If I use interp1, I get NaN-values and the execution is slow for large x and x_i:

starttime = cputime;
Y_i1 = interp1(x, Y, x_i);
executiontime1 = cputime - starttime

An alternative is interp1q, which is about two times faster.

What is a very fast way which allows my modifications?

Possible ideas:

Do postprocessing of Y_i1 to eliminate NaN-values.
Use a combination of a loop and the find-command to always use the neighbour without interpolation.

Have you thought about using k nearset neighbours imputation to fill i the missing fields? There is a matlab function for it http://www.mathworks.com/help/toolbox/bioinfo/ref/knnimpute.html but its in the bioinformatics toolbox :/ but it isn't such a difficult algorithm to implement. — Dan, Aug 22 '12 at 09:10
From what I understand the input of interp1 should not contain nans. Try something like `Y_i1 = interp1(x(~isnan(Y)), Y(~isnan(Y)), x_i);` Probably better linewise. — bdecaf, Aug 22 '12 at 09:28
@Dan: Without a special distance measure, your general idea is to complete the data before doing further linear interpolation. Right? I think, this is a quite nice idea, because I do the interpolation very often with the same underlying data. @bdecaf: Just eleminating the `NaN`s yields to strange interpolated values between the last valid value before the `NaN`s and the first valid value after them. Thus, this does yield the last valid value before the `NaN`s but a mixture of both. — Lukas, Aug 22 '12 at 10:31

score 1 · Accepted Answer · answered Aug 22 '12 at 10:47

1

Using interp1 with spline interpolation (spline) ignores NaN's.

answered Aug 22 '12 at 10:47

AGS

14,288
5
52
67

Using `spline` is a good idea although it feels to be slower than linear interpolation. After interpolation I want to apply different calculations depending on the `x_i`-value. Is there an alternative to loop through all elements and have an if-statement for each element? – Lukas Aug 22 '12 at 11:46

Modified linear interpolation with missing data

1 Answers1