32

From a list of 2D coordinates, and a third variable (velocity), I have created a 2D numpy array covering the whole sampled area. My intention is to create an image, in which each pixel contains the mean velocity of the points lying within it. After that filter that image with a gaussian filter.

The problem is that the area is not uniformly sampled. Therefore I have several pixels without information (Nan) in the middle of the image. When I try to filter the array through a gaussian filter, the Nan propagate ruining the whole image.

I need to filter this image, but rejecting all pixels without information. In other words, If a pixel does not contain information, then it should be not taken into account for the filtering.

Here is an example of my code for averaging:

Mean_V = np.zeros([len(x_bins), len(y_bins)])

for i, x_bin in enumerate(x_bins[:-1]):
    bin_x = (x > x_bins[i]) & (x <= x_bins[i+1])
    for j, y_bin in enumerate(y_bins[:-1]):
        bin_xy = (y[bin_x] > y_bins[j]) & (y[bin_x] <= y_bins[j+1])
        if (sum(x > 0 for x in bin_xy) > 0) :
            Mean_V[i,j]=np.mean(V[bin_x][bin_xy])
        else:
            Mean_V[i,j]=np.nan

EDIT:

Surfing the web I have ended into this question I made in 2013. The solution to this problem can be found in the astropy library:

http://docs.astropy.org/en/stable/convolution/

Astropy's convolution replaces the NaN pixels with a kernel-weighted interpolation from their neighbors.

Thanks folks!!

user2761243
  • 329
  • 1
  • 3
  • 5
  • the scipy.stats package offers the functions nanmean and nanstd, that ignore nan, instead of returning nan. Exchange the numpy.mean / numpy.std in your code by them and everything should be fine (; – Faultier Sep 09 '13 at 13:31
  • Can you post a sample of your averaging code? – Daniel Sep 09 '13 at 13:56
  • Suspect you will have to write the loops to do the convolution and checks your self. – tacaswell Sep 09 '13 at 14:27
  • Hi! My averaging code does not have any mystery. But if you like I will add a sample. – user2761243 Sep 09 '13 at 16:48
  • Check out Matlab version of this question: http://stackoverflow.com/questions/29833068/filter-image-that-contains-nans-in-matlab – eric Apr 23 '15 at 22:09
  • A simple solution might be to reduce `truncate` so that a single / each NaN affects fewer points in your output. – pas-calc Apr 14 '22 at 16:20
  • You might verify that this still ensures that the gaussian kernel is normalized by the [implementation](https://github.com/scipy/scipy/blob/v1.8.0/scipy/ndimage/_filters.py#L189), see [docs - source](https://docs.scipy.org/doc/scipy/reference/generated/scipy.ndimage.gaussian_filter1d.html). – pas-calc Apr 14 '22 at 16:28

3 Answers3

45

in words:

A Gaussian filter which ignores NaNs in a given array U can be easily obtained by applying a standard Gaussian filter to two auxiliary arrays V and W and by taking the ratio of the two to get the result Z.

Here, V is copy of the original U with NaNs replaced by zeros and W is an array of ones with zeros indicating the positions of NaNs in the original U.

The idea is that replacing the NaNs by zeros introduces an error in the filtered array which can, however, be compensated by applying the same Gaussian filter to another auxiliary array and combining the two.

in Python:

import numpy as np
import scipy as sp
import scipy.ndimage

sigma=2.0                  # standard deviation for Gaussian kernel
truncate=4.0               # truncate filter at this many sigmas

U=sp.randn(10,10)          # random array...
U[U>2]=np.nan              # ...with NaNs for testing

V=U.copy()
V[np.isnan(U)]=0
VV=sp.ndimage.gaussian_filter(V,sigma=sigma,truncate=truncate)

W=0*U.copy()+1
W[np.isnan(U)]=0
WW=sp.ndimage.gaussian_filter(W,sigma=sigma,truncate=truncate)

Z=VV/WW

in numbers:

Here coefficients of the Gaussian filter are set to [0.25,0.50,0.25] for demonstration purposes and they sum up to one 0.25+0.50+0.25=1, without loss of generality.

After replacing the NaNs by zeros and applying the Gaussian filter (cf. VV below) it is clear that the zeros introduce an error, i.e., due to the "missing" data the coefficients 0.25+0.50=0.75 do not sum up to one anymore and therefore underestimate the "true" value.

However, this can be compensated by using the second auxiliary array (cf. WW below) which, after filtering with the same Gaussian, just contains the sum of coefficients.

Therefore, dividing the two filtered auxiliary arrays rescales the coefficients such that they sum up to one while the NaN positions are ignored.

array U         1   2   NaN 1   2    
auxiliary V     1   2   0   1   2    
auxiliary W     1   1   0   1   1
position        a   b   c   d   e

filtered VV_b   = 0.25*V_a  + 0.50*V_b  + 0.25*V_c
                = 0.25*1    + 0.50*2    + 0
                = 1.25

filtered WW_b   = 0.25*W_a  + 0.50*W_b  + 0.25*W_c
                = 0.25*1    + 0.50*1    + 0
                = 0.75

ratio Z         = VV_b / WW_b  
                = (0.25*1 + 0.50*2) / (0.25*1    + 0.50*1)
                = 0.333*1 + 0.666*2
                = 1.666

update - division-by-zero:

The following incorporates useful questions and answers by @AndyL and @amain from the comments below, thanks!

Large areas of NaNs may lead to a zero denominator (WW=0) at some positions when there are only NaN entries within the support of the Gaussian kernel (in theory that support is infinite, but in practice the kernel is usually truncated, see 'truncate' parameter in code example above). In that situation, the nominator becomes zero as well (VV=0) so that numpy throws a 'RuntimeWarning: invalid value encountered in true_divide' and returns NaN at the corresponding positions.

This is probably the most consistent/meaningful result and if you can live with a numpy warning, no further adjustments are required.

David
  • 1,909
  • 1
  • 20
  • 32
  • 3
    I don't know if I completely understand why, but this method works and gives almost the exact same result as `astropy.convolve` with `astropy.convolution.Gaussian2DKernel` - and 10x faster. – j08lue Oct 02 '17 at 19:33
  • 1
    This is fantastically clever -- and I also don't fully understand why it works. – Sealander Aug 06 '18 at 17:43
  • This is great! The WW is a clever way to track how much of the gaussian kernel is valid. Maybe you could make `V[U!=U]=0` into `V[np.isnan(U)]` for clarity? – Scott Staniewicz Jan 24 '19 at 18:52
  • This is called normalized convolution. – Cris Luengo Jan 29 '19 at 19:42
  • 1
    Thanks @ScottStaniewicz, I implemented you suggestion! – David Mar 05 '19 at 16:11
  • Clever trick. What happens when you get a large swath of NaNs. Will this result in a divide by zero error? Also, is this equivalent to linearly interpolating first and then convolving? – AndyL Jan 03 '20 at 16:15
  • @AndyL Theoretically, the support of the Gaussian function is infinite, so as long as there is one non-NaN entry, the denominator (called WW above) will be strictly positive. Practically, large NaN areas or small Gaussian sigma parameters and limited numerical precision may indeed lead to a denominator WW~0. However, in that case also the nominator VV~0. – David Jan 05 '20 at 16:01
  • 1
    @David Theoretically yes, but practically most implementations of gaussian filters are truncated by default. For example, scipy's `ndimage.gaussian_filter` has a default truncate value of 4.0 standard deviations. To avoid 'division-by-zero'-like numpy warnings, zeros in WW could be converted to NaNs prior to division. – amain Mar 23 '20 at 02:11
  • Due to this fact, it might be already a simple solution to adjust (decrease) the `truncate` argument in the initial gaussian kernel of your data, so that a single NaN might affect less values in your filtered output. – pas-calc Apr 14 '22 at 16:19
  • to prevent the warning `WW[WW==0]=np.nan` , anyway the result of that will be NaN – pas-calc Apr 14 '22 at 16:52
  • aside, for the top comment: astropy's convolution can perform as well as the others if you turn off padding. See https://github.com/astropy/astropy/issues/11242#issuecomment-760169792 and https://keflavich-astropy.readthedocs.io/en/convolve_fft_profiling/convolution/performance.html – keflavich Aug 26 '22 at 01:18
11

enter image description here

I stepped over this question a while ago and used davids answer (thanks!). As it turned out in the meantime, the task of applying a gaussian filter to a array with nans is not as well defined as I thought.

As descibed in ndimage.gaussian_filter, there are different options to process values at the border of the image (reflection, constant extrapolation, ...). A similar decision has to be made for the nan values in the image.

  • Some idea might be, so linarly interpolate nan values, but the question arrises, what to do with nans at the image borders.
  • filter_nan_gaussian_david: Davids approach is equivalent to assuming some mean-neighborhood-value at each nan-point. This leads to a change in the total intensity (see sum value in colum 3), but does a great job otherwise.
  • filter_nan_gaussian_conserving: This approach is to spead the intesity of each point by a gaussian filter. The intensity, which is mapped to nan-pixels is reshifted back to the origin. If this maskes sense, depends on the application. I have a closed area surronded by nans and want to preseve the total intensity + avoid distortions at the boundaries.
  • filter_nan_gaussian_conserving2: Speads intesity of each point by a gaussian filter. The intensity, which is mapped to nan-pixels is redirected to the other pixels with the same Gaussian weighting. This leads to a relative reduction of the intensity at the origin in the vicinity of many nans / border pixels. This is illustrated in the last row very right.

Code

import numpy as np
from scipy import ndimage
import matplotlib as mpl
import matplotlib.pyplot as plt

def filter_nan_gaussian_conserving(arr, sigma):
    """Apply a gaussian filter to an array with nans.

    Intensity is only shifted between not-nan pixels and is hence conserved.
    The intensity redistribution with respect to each single point
    is done by the weights of available pixels according
    to a gaussian distribution.
    All nans in arr, stay nans in gauss.
    """
    nan_msk = np.isnan(arr)

    loss = np.zeros(arr.shape)
    loss[nan_msk] = 1
    loss = ndimage.gaussian_filter(
            loss, sigma=sigma, mode='constant', cval=1)

    gauss = arr.copy()
    gauss[nan_msk] = 0
    gauss = ndimage.gaussian_filter(
            gauss, sigma=sigma, mode='constant', cval=0)
    gauss[nan_msk] = np.nan

    gauss += loss * arr

    return gauss

def filter_nan_gaussian_conserving2(arr, sigma):
    """Apply a gaussian filter to an array with nans.

    Intensity is only shifted between not-nan pixels and is hence conserved.
    The intensity redistribution with respect to each single point
    is done by the weights of available pixels according
    to a gaussian distribution.
    All nans in arr, stay nans in gauss.
    """
    nan_msk = np.isnan(arr)

    loss = np.zeros(arr.shape)
    loss[nan_msk] = 1
    loss = ndimage.gaussian_filter(
            loss, sigma=sigma, mode='constant', cval=1)

    gauss = arr / (1-loss)
    gauss[nan_msk] = 0
    gauss = ndimage.gaussian_filter(
            gauss, sigma=sigma, mode='constant', cval=0)
    gauss[nan_msk] = np.nan

    return gauss

def filter_nan_gaussian_david(arr, sigma):
    """Allows intensity to leak into the nan area.
    According to Davids answer:
        https://stackoverflow.com/a/36307291/7128154
    """
    gauss = arr.copy()
    gauss[np.isnan(gauss)] = 0
    gauss = ndimage.gaussian_filter(
            gauss, sigma=sigma, mode='constant', cval=0)

    norm = np.ones(shape=arr.shape)
    norm[np.isnan(arr)] = 0
    norm = ndimage.gaussian_filter(
            norm, sigma=sigma, mode='constant', cval=0)

    # avoid RuntimeWarning: invalid value encountered in true_divide
    norm = np.where(norm==0, 1, norm)
    gauss = gauss/norm
    gauss[np.isnan(arr)] = np.nan
    return gauss



fig, axs = plt.subplots(3, 4)
fig.suptitle('black: 0, white: 1, red: nan')
cmap = mpl.cm.get_cmap('gist_yarg_r')
cmap.set_bad('r')
def plot_info(ax, arr, col):
    kws = dict(cmap=cmap, vmin=0, vmax=1)
    if col == 0:
        title = 'input'
    elif col == 1:
        title = 'filter_nan_gaussian_conserving'
    elif col == 2:
        title = 'filter_nan_gaussian_david'
    elif col == 3:
        title = 'filter_nan_gaussian_conserving2'
    ax.set_title(title + '\nsum: {:.4f}'.format(np.nansum(arr)))
    ax.imshow(arr, **kws)

sigma = (1,1)

arr0 = np.zeros(shape=(6, 10))
arr0[2:, :] = np.nan
arr0[2, 1:3] = 1

arr1 = np.zeros(shape=(6, 10))
arr1[2, 1:3] = 1
arr1[3, 2] = np.nan

arr2 = np.ones(shape=(6, 10)) *.5
arr2[3, 2] = np.nan

plot_info(axs[0, 0], arr0, 0)
plot_info(axs[0, 1], filter_nan_gaussian_conserving(arr0, sigma), 1)
plot_info(axs[0, 2], filter_nan_gaussian_david(arr0, sigma), 2)
plot_info(axs[0, 3], filter_nan_gaussian_conserving2(arr0, sigma), 3)

plot_info(axs[1, 0], arr1, 0)
plot_info(axs[1, 1], filter_nan_gaussian_conserving(arr1, sigma), 1)
plot_info(axs[1, 2], filter_nan_gaussian_david(arr1, sigma), 2)
plot_info(axs[1, 3], filter_nan_gaussian_conserving2(arr1, sigma), 3)

plot_info(axs[2, 0], arr2, 0)
plot_info(axs[2, 1], filter_nan_gaussian_conserving(arr2, sigma), 1)
plot_info(axs[2, 2], filter_nan_gaussian_david(arr2, sigma), 2)
plot_info(axs[2, 3], filter_nan_gaussian_conserving2(arr2, sigma), 3)

plt.show()
Markus Dutschke
  • 9,341
  • 4
  • 63
  • 58
  • Is this directly applicable to higher orders of the gaussian filter ? I tried it, and several artifacts appear on the borders. – Liris Feb 20 '23 at 13:36
1

How about replacing Z=VV/WW with Z=VV/(WW+epsilon) with epsilon=0.000001 to automatically handle the cases without any observations in the previous proposal

  • I am not sure this answers the question. If Original Poster has NaNs, it will not help. An op with a nan returns a nan. Maybe you should elaborate on your answer. – Alain Merigot Jan 29 '19 at 20:29
  • Neither VV not WW contains nan. VV as well as WW will become zero, thus epsilon will make the division computable and will give a zero as a result. – Christopher Grimm Jan 30 '19 at 21:20