Finding the NaN boundary of a matrix in MATLAB

Question

I have a very large (2019x1678 double) DEM (digital elevation model) file put as a matrix in MATLAB. The edges of it contain NaN values. In order to account for edge effects in my code, I have to put a 1 cell buffer (same value as adjacent cell) around my DEM. Where NaNs are present, I need to find the edge of the NaN values in order to build that buffer. I have tried doing this two ways:

In the first I get the row and column coordinates all non-NaN DEM values, and find the first and last row numbers for each column to get the north and south boundaries, then find the first and last column numbers for each row to get the east and west boundaries. I use these in the sub2ind() to create my buffer.

[r, c] = find(~isnan(Zb_ext)); %Zb is my DEM matrix
idx = accumarray(c, r, [], @(x) {[min(x) max(x)]});
idx = vertcat(idx{:});

NorthBoundary_row = transpose(idx(:,1)); % the values to fill my buffer with
NorthBoundary_row_ext = transpose(idx(:,1) - 1); % My buffer cells
columnmax = length(NorthBoundary_row);
column1 = min(c);
Boundary_Colu = linspace(column1,column1+columnmax-1,columnmax);
SouthBoundary_row = (transpose(idx(:,2))); % Repeat for south Boundary
SouthBoundary_row_ext = transpose(idx(:,2) + 1);

SouthB_Ind = sub2ind(size(Zb_ext),SouthBoundary_row,Boundary_Colu);
SouthB_Ind_ext = sub2ind(size(Zb_ext),SouthBoundary_row_ext, Boundary_Colu);
NorthB_Ind = sub2ind(size(Zb_ext),NorthBoundary_row, Boundary_Colu);
NorthB_Ind_ext = sub2ind(size(Zb_ext),NorthBoundary_row_ext, Boundary_Colu);

Zb_ext(NorthB_Ind_ext) = Zb_ext(NorthB_Ind);
Zb_ext(SouthB_Ind_ext) = Zb_ext(SouthB_Ind);

% Repeat above for East and West Boundary by reversing the roles of row and 
% column

[r, c] = find(~isnan(Zb_ext));
idx = accumarray(r, c, [], @(x) {[min(x) max(x)]});
idx = vertcat(idx{:});

EastBoundary_colu = transpose(idx(:,1)); % Repeat for east Boundary
EastBoundary_colu_ext = transpose(idx(:,1) - 1); 
row1 = min(r);
rowmax = length(EastBoundary_colu);
Boundary_row = linspace(row1,row1+rowmax-1,rowmax);
WestBoundary_colu = transpose(idx(:,2)); % Repeat for west Boundary
WestBoundary_colu_ext = transpose(idx(:,2) + 1);

EastB_Ind = sub2ind(size(Zb_ext),Boundary_row, EastBoundary_colu);
EastB_Ind_ext = sub2ind(size(Zb_ext),Boundary_row, EastBoundary_colu_ext);
WestB_Ind = sub2ind(size(Zb_ext),Boundary_row, WestBoundary_colu);
WestB_Ind_ext = sub2ind(size(Zb_ext),Boundary_row, WestBoundary_colu_ext);

Zb_ext(NorthB_Ind_ext) = Zb_ext(NorthB_Ind);
Zb_ext(SouthB_Ind_ext) = Zb_ext(SouthB_Ind);
Zb_ext(EastB_Ind_ext) = Zb_ext(EastB_Ind);
Zb_ext(WestB_Ind_ext) = Zb_ext(WestB_Ind);

This works well on my small development matrix, but fails on my full sized DEM. I do not understand the behavior of my code, but looking at the data there are gaps in my boundary. I wonder if I need to better control the order of max/min row/column values, though in my test on a smaller dataset, all seemed in order....

The second method I got from a similar question to this and basically uses a dilation method. However, when I transition to my full dataset, it takes hours to calculate ZbDilated. Although my first method does not work, it at least calculates within seconds.

[m, n] = size(Zb); % 
Zb_ext = nan(size(Zb)+2);
Zb_ext(2:end-1, 2:end-1) = Zb; % pad Zb with zeroes on each side
ZbNANs = ~isnan(Zb_ext);
ZbDilated = zeros(m + 2, n + 2); % this will hold the dilated shape.

for i = 1:(m+2)
    if i == 1 %handling boundary situations during dilation
        i_f = i;
        i_l = i+1;
    elseif i == m+2
        i_f = i-1;
        i_l = i;
    else
        i_f = i-1;
        i_l = i+1;
    end
    for j = 1:(n+2)
        mask = zeros(size(ZbNANs));
        if j == 1 %handling boundary situations again
            j_f = j;
            j_l = j+1;
        elseif j == n+2
            j_f = j-1;
            j_l = j;
        else
            j_f = j-1;
            j_l = j+1;
        end
        
        mask(i_f:i_l, j_f:j_l) = 1; % this places a 3x3 square of 1's around (i, j)
        ZbDilated(i, j) = max(ZbNANs(logical(mask)));
    end
end

Zb_ext(logical(ZbDilated)) = fillmissing(Zb_ext(logical(ZbDilated)),'nearest');

Does anyone have any ideas on making either of these usable?

Here is what I start out with:

   NaN   NaN     2     5    39    55    44     8   NaN   NaN
   NaN   NaN   NaN     7    33    48    31    66    17   NaN
   NaN   NaN   NaN    28   NaN    89   NaN   NaN   NaN   NaN

Here is the matrix buffered on the limits with NaNs:

   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN
   NaN   NaN   NaN     2     5    39    55    44     8   NaN   NaN   NaN
   NaN   NaN   NaN   NaN     7    33    48    31    66    17   NaN   NaN
   NaN   NaN   NaN   NaN    28   NaN    89   NaN   NaN   NaN   NaN   NaN
   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN

Here is what I want to get after using fillmissing (though I have noticed some irregularities with how buffer values are filled...):

   NaN   NaN     2     2     5    39    55    44     8    17   NaN   NaN
   NaN   NaN     2     2     5    39    55    44     8    17    17   NaN
   NaN   NaN     2     2     7    33    48    31    66    17    17   NaN
   NaN   NaN   NaN     2    28    33    89    31    66    17    17   NaN
   NaN   NaN   NaN     5    28    55    89     8   NaN   NaN   NaN   NaN

To try and clear up any confusion about what I am doing, here is the logical I get from dilation I use for fillmissing

       0   0   1     1    1    1    1    1   1   1   0   0
       0   0   1     1    1    1    1    1   1   1   1   0
       0   0   1     1    1    1    1    1   1   1   1   0
       0   0   0     1    1    1    1    1   1   1   1   0
       0   0   0     1    1    1    1    1   0   0   0   0

What is a DEM? For dilation use [`imdilate`](https://www.mathworks.com/help/images/ref/imdilate.html). — Cris Luengo, Jun 22 '22 at 02:07
@CrisLuengo I though imdilate was available through the image package? I am trying to avoid using outside packages for my code. — Alexander Audet, Jun 22 '22 at 02:53
Processing a 2019x1678 image should not take hours. I suspect that what's taking all of the time is generating a full size mask for each 3x3 window rather than just using direct indexing. For your code without the masking, I'm getting around 80 seconds in Octave; MATLAB is likely faster. But that's still glacially slow, as `imdilate` is around 60 **milli**seconds. Are the values in the last matrix what you actually want? For example, I don't see how the value at 5,4 should be a 5. Normal 8-neighbor dilation would make that a 28. — beaker, Jun 22 '22 at 05:50
@CrisLuengo I guess that's my question: what is the desired behavior, the output of `imdilate` or the output of `fillmissing`? It just seems odd to me to do 2d dilation and then 1d extrapolation. If you're going to use `fillmissing` anyway and throw out the values from the dilated image, why not create `ZbDilated` as a logical mask from the beginning using OR rather than a double using `max`? — beaker, Jun 22 '22 at 14:13
@beaker: Indeed, there is no point in using `max` there. What is being dilated is the logical matrix `~isnan(Zb_ext)`. `any` would be faster because it can short-circuit. — Cris Luengo, Jun 22 '22 at 14:24
@beaker I want the output of fillmissing but am using dilate to create a logical mask to control the extent of fillmissing. (if I fill in everything, it screws with my model. But I need a 1 cell buffer). To be honest, I am still working on the best way to fill these buffer values as they are used in a shear stress component calculations, but we don't want them to actually add any shear stress at the borders. Fillmissing might not be right for the job, but that will be another question if anything. — Alexander Audet, Jun 22 '22 at 17:53
That's why I was questioning the use of `fillmissing`. Seems to me that it would be easier to replicate the behavior of `imdilate` and then use `isnan(Zb_ext)` as a mask to choose the buffer values to put back into `Zb_ext`. — beaker, Jun 22 '22 at 22:44
@beaker That might actually work if I use the first part of the answer below. Even removing the mask as you suggest only cuts the time by half for some reason (still 2 hours). Not sure why you only get 80 seconds.... — Alexander Audet, Jun 23 '22 at 04:39

Cris Luengo · Accepted Answer · 2022-06-22T14:29:40.153

A faster way to apply a 3x3 dilation would be as follows. This does involve some large intermediate matrices, which make it less efficient than, say applying imdilate.

[m, n] = size(Zb); % 
Zb_ext = nan(size(Zb)+2);
Zb_ext(2:end-1, 2:end-1) = Zb; % pad A with zeroes on each side
ZbNANs = ~isnan(Zb_ext);
ZbDilated = ZbNANs; % this will hold the dilated shape.

% up and down neighbors
ZbDilated(2:end, :) = max(ZbDilated(2:end, :), ZbNANs(1:end-1, :));
ZbDilated(1:end-1, :) = max(ZbDilated(1:end-1, :), ZbNANs(2:end, :));

% left and right neighbors
ZbDilated(:, 2:end) = max(ZbDilated(:, 2:end), ZbNANs(:, 1:end-1));
ZbDilated(:, 1:end-1) = max(ZbDilated(:, 1:end-1), ZbNANs(:, 2:end));

% and 4 diagonal neighbors
ZbDilated(2:end, 2:end) = max(ZbDilated(2:end, 2:end), ZbNANs(1:end-1, 1:end-1));
ZbDilated(1:end-1, 2:end) = max(ZbDilated(1:end-1, 2:end), ZbNANs(2:end, 1:end-1));
ZbDilated(2:end, 1:end-1) = max(ZbDilated(2:end, 1:end-1), ZbNANs(1:end-1, 2:end));
ZbDilated(1:end-1, 1:end-1) = max(ZbDilated(1:end-1, 1:end-1), ZbNANs(2:end, 2:end));

This is a tedious way to write it, I'm sure there's a loop that can be written that is shorter, but this I think makes the intention clearer.

[Edit: Because we're dealing with a logical array here, instead of max(A,B) we could also do A | B. I'm not sure if there would be any difference in time.]

What @beaker said in a comment was to not use

mask = zeros(size(ZbNANs));
mask(i_f:i_l, j_f:j_l) = 1; % this places a 3x3 square of 1's around (i, j)
ZbDilated(i, j) = max(ZbNANs(logical(mask)));

but rather do

ZbDilated(i, j) = max(ZbNANs(i_f:i_l, j_f:j_l), [], 'all');

[Edit: Because we're dealing with a logical array here, instead of max(A,[],'all') we could also do any(A,'all'), which should be faster. See @beaker's other comment.]

The first part of this works, and is fast. I do not see any gaps in my buffer. I replaced Max() with A | B. The second part did speed up my code, but it would still have taken over 2 hours. (down from 4-5 hours). — Alexander Audet, Jun 22 '22 at 17:44

Finding the NaN boundary of a matrix in MATLAB

1 Answers1

Linked