10

I have a 2d array (doubles) representing some data, and it has a bunch of NaNs in it. The contour plot of the data looks like this:

prefiltered

All of the white spaces are NaNs, the gray diamond is there for reference, and the filled contour shows the shape of my data. When I filter the data with imfilt, the NaNs significantly chew into the data, so we end up with something like this:

postfiltered

You can see that the support set is significantly contracted. I can't use this, as it has chewed into some of the more interesting variations on the edges (for reasons specific to my experiments, those edges are important).

Is there a function to filter within an island of NaNs that treats edges similar to edges of rectangular filtering windows, instead of just killing the edges? Sort of like an nanmean function, except for convolving images?

Here is my filter code:

filtWidth = 7;
imageFilter=fspecial('gaussian',filtWidth,filtSigma);
%convolve them
dataFiltered = imfilter(rfVals,imageFilter,'symmetric','conv');

and the code for plotting the contour plot:

figure
contourf(dataFiltered); hold on
plot([-850 0 850 0 -850], [0 850 0 -850 0], 'Color', [.7 .7 .7],'LineWidth', 1); %the square (limits are data-specific)
axis equal

There is some code at the Mathworks file exchange (ndanfilter.m) that comes close to what I want, but I believe it only interpolates NaNs that are sprinkled on the interior of an image, not data showing this island-type effect.

Note: I just found nanconv.m, which does exactly what I want, with a very intuitive usage (convolve an image, ignoring NaN, much like nanmean works). I've made this part of my accepted answer, and include a comparison to the performance of the other answers.

Related questions

Gaussian filtering a image with Nan in Python

Community
  • 1
  • 1
eric
  • 7,142
  • 12
  • 72
  • 138

4 Answers4

9

The technique I ended up using was the function nanconv.m at Matlab's File Exchange. It does exactly what I was looking for: it runs the filter in a way that ignores the NaNs just the way that Matlab's built-in function nanmean does. This is a hard to decipher from the documentation of the function, which is a tad cryptic.

Here's how I use it:

filtWidth = 7;
filtSigma = 5;
imageFilter=fspecial('gaussian',filtWidth,filtSigma);
dataFiltered = nanconv(data,imageFilter, 'nanout');

I'm pasting the nanconv function below (it is covered by the BSD license). I will post images etc when I get a chance, just wanted to post what I ended up doing for anyone curious about what I did.

Comparison to other answers

Using gnovice's solution the results look intuitively very nice, but there are some quantitative blips on the edges that were a concern. In practice, the extrapolation of the image beyond the edges led to many spuriously high values at the edges of my data.

Using krisdestruction's suggestion of replacing the missing bits with the original data, also looks pretty decent (especially for very small filters), but (by design) you end up with unfiltered data at the edges, which is a problem for my application.

nanconv

function c = nanconv(a, k, varargin)
% NANCONV Convolution in 1D or 2D ignoring NaNs.
%   C = NANCONV(A, K) convolves A and K, correcting for any NaN values
%   in the input vector A. The result is the same size as A (as though you
%   called 'conv' or 'conv2' with the 'same' shape).
%
%   C = NANCONV(A, K, 'param1', 'param2', ...) specifies one or more of the following:
%     'edge'     - Apply edge correction to the output.
%     'noedge'   - Do not apply edge correction to the output (default).
%     'nanout'   - The result C should have NaNs in the same places as A.
%     'nonanout' - The result C should have ignored NaNs removed (default).
%                  Even with this option, C will have NaN values where the
%                  number of consecutive NaNs is too large to ignore.
%     '2d'       - Treat the input vectors as 2D matrices (default).
%     '1d'       - Treat the input vectors as 1D vectors.
%                  This option only matters if 'a' or 'k' is a row vector,
%                  and the other is a column vector. Otherwise, this
%                  option has no effect.
%
%   NANCONV works by running 'conv2' either two or three times. The first
%   time is run on the original input signals A and K, except all the
%   NaN values in A are replaced with zeros. The 'same' input argument is
%   used so the output is the same size as A. The second convolution is
%   done between a matrix the same size as A, except with zeros wherever
%   there is a NaN value in A, and ones everywhere else. The output from
%   the first convolution is normalized by the output from the second 
%   convolution. This corrects for missing (NaN) values in A, but it has
%   the side effect of correcting for edge effects due to the assumption of
%   zero padding during convolution. When the optional 'noedge' parameter
%   is included, the convolution is run a third time, this time on a matrix
%   of all ones the same size as A. The output from this third convolution
%   is used to restore the edge effects. The 'noedge' parameter is enabled
%   by default so that the output from 'nanconv' is identical to the output
%   from 'conv2' when the input argument A has no NaN values.
%
% See also conv, conv2
%
% AUTHOR: Benjamin Kraus (bkraus@bu.edu, ben@benkraus.com)
% Copyright (c) 2013, Benjamin Kraus
% $Id: nanconv.m 4861 2013-05-27 03:16:22Z bkraus $

% Process input arguments
for arg = 1:nargin-2
    switch lower(varargin{arg})
        case 'edge'; edge = true; % Apply edge correction
        case 'noedge'; edge = false; % Do not apply edge correction
        case {'same','full','valid'}; shape = varargin{arg}; % Specify shape
        case 'nanout'; nanout = true; % Include original NaNs in the output.
        case 'nonanout'; nanout = false; % Do not include NaNs in the output.
        case {'2d','is2d'}; is1D = false; % Treat the input as 2D
        case {'1d','is1d'}; is1D = true; % Treat the input as 1D
    end
end

% Apply default options when necessary.
if(exist('edge','var')~=1); edge = false; end
if(exist('nanout','var')~=1); nanout = false; end
if(exist('is1D','var')~=1); is1D = false; end
if(exist('shape','var')~=1); shape = 'same';
elseif(~strcmp(shape,'same'))
    error([mfilename ':NotImplemented'],'Shape ''%s'' not implemented',shape);
end

% Get the size of 'a' for use later.
sza = size(a);

% If 1D, then convert them both to columns.
% This modification only matters if 'a' or 'k' is a row vector, and the
% other is a column vector. Otherwise, this argument has no effect.
if(is1D);
    if(~isvector(a) || ~isvector(k))
        error('MATLAB:conv:AorBNotVector','A and B must be vectors.');
    end
    a = a(:); k = k(:);
end

% Flat function for comparison.
o = ones(size(a));

% Flat function with NaNs for comparison.
on = ones(size(a));

% Find all the NaNs in the input.
n = isnan(a);

% Replace NaNs with zero, both in 'a' and 'on'.
a(n) = 0;
on(n) = 0;

% Check that the filter does not have NaNs.
if(any(isnan(k)));
    error([mfilename ':NaNinFilter'],'Filter (k) contains NaN values.');
end

% Calculate what a 'flat' function looks like after convolution.
if(any(n(:)) || edge)
    flat = conv2(on,k,shape);
else flat = o;
end

% The line above will automatically include a correction for edge effects,
% so remove that correction if the user does not want it.
if(any(n(:)) && ~edge); flat = flat./conv2(o,k,shape); end

% Do the actual convolution
c = conv2(a,k,shape)./flat;

% If requested, replace output values with NaNs corresponding to input.
if(nanout); c(n) = NaN; end

% If 1D, convert back to the original shape.
if(is1D && sza(1) == 1); c = c.'; end

end
Community
  • 1
  • 1
eric
  • 7,142
  • 12
  • 72
  • 138
  • 1
    Note, a recent submission to the File Exchange has a slightly more flexible implementation: https://www.mathworks.com/matlabcentral/fileexchange/20417-ndnanfilter-m – John Dec 12 '19 at 00:23
3

One approach would be to replace the NaN values with nearest-neighbor interpolates using scatteredInterpolant (or TriScatteredInterp in older MATLAB versions) before performing the filtering, then replacing those points again with NaN values afterward. This would be akin to filtering a full 2-D array using the 'replicate' argument as opposed to the 'symmetric' argument as a boundary option for imfilter (i.e. you're replicating as opposed to reflecting values at the jagged NaN boundary).

Here's what the code would look like:

% Make your filter:
filtWidth = 7;
imageFilter = fspecial('gaussian', filtWidth, filtWidth);

% Interpolate new values for Nans:
nanMask = isnan(rfVals);
[r, c] = find(~nanMask);
[rNan, cNan] = find(nanMask);
F = scatteredInterpolant(c, r, rfVals(~nanMask), 'nearest');
interpVals = F(cNan, rNan);
data = rfVals;
data(nanMask) = interpVals;

% Filter the data, replacing Nans afterward:
dataFiltered = imfilter(data, imageFilter, 'replicate', 'conv');
dataFiltered(nanMask) = nan;
gnovice
  • 125,304
  • 15
  • 256
  • 359
  • do you have any understanding of the difference between using `imfilter(....'replicate')` and the function `nanconv` (http://www.mathworks.com/matlabcentral/fileexchange/41961-nanconv)? They are both giving similar results, but I'm not really understanding how either one works, frankly. – eric Apr 23 '15 at 21:37
  • looks very promising (my version I only have `TriScatteredInterp` but it works the exact same). Am presently comparing it to performance of `nanconv`, in practice.... – eric Apr 23 '15 at 21:40
  • @neuronet I'm not familiar with `nanconv` or the underlying code is uses, so I'm not certain yet what the differences might be. – gnovice Apr 23 '15 at 21:43
  • thanks I will look into it, might be tomorrow. Would prefer native Matlab code, though, so will go with your answer everything else being equal. – eric Apr 23 '15 at 21:43
  • @gnovice Wouldn't the interpolation change the effect of the filter has on the data? – krisdestruction Apr 24 '15 at 03:29
  • @gnovice so basically we are 1) *extrapolating* at the edges to fill the entire matrix with numbers instead of `NaNs` (while we use a function called interpolate, it seems more like extrapolation way beyond the actual data), 2) running the filter on the new matrix, and 3) apply a mask to restrict the results to the original support set. This seems to work pretty well. If you don't mind I'll edit your answer, with some figures, to make it crystal clear... – eric Apr 24 '15 at 13:36
  • @gnovice: would you mind if I replaced your `scatteredInterpolant` with `TriScatteredInterp`? The latter is available on more (i.e., older) versions of Matlab. Will get to it this weekend, am making images of every step in your algorithm from my data it should be pretty cool... – eric Apr 24 '15 at 14:23
  • 1
    @neuronet: Adding some figures using the data from your question would be good. I'll edit in a mention of `TriScatteredInterp` for older versions. – gnovice Apr 24 '15 at 19:35
  • @krisdestruction: That's always an issue at boundaries, and as neuronet points out the interpolation done here is more of an extrapolation of data into the regions filled with NaNs.The choice of how this extrapolation is done will certainly change the results of the filtering at the edges of the non-NaN data. – gnovice Apr 24 '15 at 20:09
  • @gnovice Well yes I agree, and obviously the method depends on the OP's preference of the edge cases. However I was asking about the non NaN data. Since I'm not familiar with `scatteredInterpolant` can I assume the non-nan data in the middle of the output of our methods will be the same? – krisdestruction Apr 24 '15 at 20:27
  • 1
    @krisdestruction yes it is the same. It only interpolates where there are no data points. I have been playing with this a lot today, and while it looks intuitively good, quantitatively the edges are a bit weird, as filtering the extrapolated image amplifies little variations at the edges. The `nanconv` function I found seems to work a bit better, much more like nanmean in that it simply *ignores* the nans, much like the `nanmean` built-in function. It is really a quite clever function. Will post this as a separate answer most likely, so busy with work (where I'm analyzing data like this :)). – eric Apr 24 '15 at 20:49
  • @neuronet Okay in that case can you post a separate answer and accept that one? That way future viewers will be able to see the answer you used. – krisdestruction Apr 24 '15 at 21:02
0

Okay without using your plot function, I can still give you a solution. What you want to do is find all the new NaN's and replace it with the original unfiltered data (assuming it is correct). While it's not filtered, it's better than reducing the domain of your contour image.

% Toy Example Data
rfVals= rand(100,100);
rfVals(1:2,:) = nan;
rfVals(:,1:2) = nan;

% Create and Apply Filter
filtWidth = 3;
imageFilter=fspecial('gaussian',filtWidth,filtWidth);
dataFiltered = imfilter(rfVals,imageFilter,'symmetric','conv');
sum(sum(isnan( dataFiltered ) ) )

% Replace New NaN with Unfiltered Data
newnan = ~isnan( rfVals) & isnan( dataFiltered );
dataFiltered( newnan ) = rfVals( newnan );
sum(sum(isnan( rfVals) ) )
sum(sum(isnan( dataFiltered ) ) )

Detect new NaN using the following code. You can also probably use the xor function.

newnan = ~isnan( rfVals) & isnan( dataFiltered );

Then this line sets the indices in dataFiltered to the values in rfVals

dataFiltered( newnan ) = rfVals( newnan );

Results

From the lines printed in the console and my code, you can see that the number of NaN in dataFiltered is reduced from 688 to 396 as was the number of NaN in rfVals.

ans =
   688
ans =
   396
ans =
   396

Alternate Solution 1

You can also use a smaller filter near the edges by specifying a smaller kernel and merging it after, but if you just want valid data with minimal code, my main solution will work.

Alternate Solution 2

An alternate approach is to pad/replace the NaN values with zero or some constant you want so that it will work, then truncate it. However for signal processing/filtering, you will probably want my main solution.

krisdestruction
  • 1,950
  • 1
  • 10
  • 20
  • 1
    This will work, and is pretty good, though will introduce discontinuities and unusually high frequencies at the edge. Ideally, we could built something akin to `nanmean`, but a filter that does the apt filtering when convolving, but only to the data points that are not `NaN`. Will play around with your idea, post image of what it looks like... – eric Apr 23 '15 at 20:20
  • I don't want to add zeros (that's what I was doing using a mask) but it pulls the values down at the edges.... – eric Apr 23 '15 at 20:22
  • @neuronet Right that's only an alternate solution and I know you don't want that. My main solution pulls the data that's missing from your main data. – krisdestruction Apr 23 '15 at 20:25
  • @neuronet Another alternate approach is to use a smaller sized filter near the edges. While it's more accurate, there's more code involved. – krisdestruction Apr 23 '15 at 20:28
  • @neuronet After playing around with it, if you still wish for me to elaborate on "Alternate Solution 1", just ping me by posting a comment. Otherwise if it helped you and it works, kindly accept the solution :) – krisdestruction Apr 23 '15 at 20:30
  • 1
    yes of course I will accept if it helps and it works, no need to keep reminding me :) I am looking into something like `nanconvolve` to compare them. Alternate solution is pretty good, but Ockham's razor starts to rear its head. Like I said, looking into what seems most parsimonious alternate that convolves but ignoring `NaNs`. – eric Apr 23 '15 at 20:32
  • http://www.mathworks.com/matlabcentral/fileexchange/41961-nanconv/content/nanconv.m – eric Apr 23 '15 at 20:34
  • @neuronet Yup I know, I just find a lot of my answers go unaccepted even though it's correct so I tend to send 1 comment. Forgot that I already did. I don't know about `nanconvolve`, is it a builtin function? – krisdestruction Apr 23 '15 at 20:34
  • @neuronet I see, interesting! Let me know if it works better for you, I can update my answer with it if it works for you. – krisdestruction Apr 23 '15 at 20:35
-3

nanfilter does exactly the same thing with nanconv when filtering as long as the filter is the same. If you get the nan values before you use nanfilter and then add the back to the after-filtered matrix, you will get the same result with what you get from nanconv with the option 'nanout', as long as you use the same filter.

  • 3
    What is nanfilter? It hasn't been mentioned once in this page yet, I cannot find it as a built-in matlab function, and google reveals nothing. Perhaps you meant to add this as a comment? – eric Jul 09 '15 at 13:26