6

I know there is scipy.signal.convolve2d function to handle 2 dimension convolution for 2d numpy array, and there is numpy.ma module to handle missing data, but these two methods don't seem to compatible with each other (which means even if you mask a 2d array in numpy, the process in convolve2d won't be affected). Is there any way to handle missing values in convolution using only numpy and scipy packages?

For example:

            1 - 3 4 5
            1 2 - 4 5
   Array =  1 2 3 - 5
            - 2 3 4 5
            1 2 3 4 -

  Kernel =  1  0
            0 -1

Desired result for convolution(Array, Kernel, boundary='wrap'):

               -1  - -1 -1 4
               -1 -1  - -1 4
    Result =   -1 -1 -1  - 5
                - -1 -1  4 4
                1 -1 -1 -1 -

Thanks for the suggestion from Aguy, that is a really good way to help the calculation of result after convolution. Now let's say we can get the mask of Array from Array.mask, which would give us a result of

                   False True  False False False                       
                   False False True  False False
    Array.mask ==  False False False True  False
                   True  False False False False
                   False False False False True

How can I use this mask to convert the result after convolution into a masked array?

Arthur
  • 179
  • 2
  • 9
  • how is the convolution supposed to deal with missing values? I feel like `Result[0,1]` should be `0` rather than `-`... – Julien Jul 12 '16 at 00:57
  • 1
    You want to replace the masked value by 0's, then convolve, then reapply the mask to the result. – Aguy Jul 12 '16 at 03:38

3 Answers3

9

I dont think replacing with 0s is the correct way of doing this, you are nudging the covolved values toward 0. These missings should literally be treated as "missings". Because they represent the missing pieces of information and there is no justification to just assume they could be 0s, and they shouldn't be involved in any calcuation at all.

I tried setting missing values to numpy.nan and then convolve, the result suggest that any overlap between the kernel and any missing gives an nan in the result, even if the overlap is with an 0 in the kernel, so you get an enlarged hole of missings in the result. Depending on your application, this could be the desired result.

But in some cases, you don't want to discard so much information just for 1 single missing (maybe <= 50% of missing is still tolerable). In such cases, I've found another module astropy that has a better implementation: numpy.nans are ignored (or replaced with interpolated values?).

So using astropy, you would do the following:

from astropy.convolution import convolve
inarray=numpy.where(inarray.mask,numpy.nan,inarray) # masking still doesn't work, has to set to numpy.nan
result=convolve(inarray,kernel)

But still, you don't have control over how much of missing is tolerable. To achieve that, I've created a function that uses the scipy.ndimage.convolve() for the initial convolution, but manually re-compute values whenever missings (numpy.nan) are involved:

def convolve2d(slab,kernel,max_missing=0.5,verbose=True):
    '''2D convolution with missings ignored

    <slab>: 2d array. Input array to convolve. Can have numpy.nan or masked values.
    <kernel>: 2d array, convolution kernel, must have sizes as odd numbers.
    <max_missing>: float in (0,1), max percentage of missing in each convolution
                   window is tolerated before a missing is placed in the result.

    Return <result>: 2d array, convolution result. Missings are represented as
                     numpy.nans if they are in <slab>, or masked if they are masked
                     in <slab>.

    '''

    from scipy.ndimage import convolve as sciconvolve

    assert numpy.ndim(slab)==2, "<slab> needs to be 2D."
    assert numpy.ndim(kernel)==2, "<kernel> needs to be 2D."
    assert kernel.shape[0]%2==1 and kernel.shape[1]%2==1, "<kernel> shape needs to be an odd number."
    assert max_missing > 0 and max_missing < 1, "<max_missing> needs to be a float in (0,1)."

    #--------------Get mask for missings--------------
    if not hasattr(slab,'mask') and numpy.any(numpy.isnan(slab))==False:
        has_missing=False
        slab2=slab.copy()

    elif not hasattr(slab,'mask') and numpy.any(numpy.isnan(slab)):
        has_missing=True
        slabmask=numpy.where(numpy.isnan(slab),1,0)
        slab2=slab.copy()
        missing_as='nan'

    elif (slab.mask.size==1 and slab.mask==False) or numpy.any(slab.mask)==False:
        has_missing=False
        slab2=slab.copy()

    elif not (slab.mask.size==1 and slab.mask==False) and numpy.any(slab.mask):
        has_missing=True
        slabmask=numpy.where(slab.mask,1,0)
        slab2=numpy.where(slabmask==1,numpy.nan,slab)
        missing_as='mask'

    else:
        has_missing=False
        slab2=slab.copy()

    #--------------------No missing--------------------
    if not has_missing:
        result=sciconvolve(slab2,kernel,mode='constant',cval=0.)
    else:
        H,W=slab.shape
        hh=int((kernel.shape[0]-1)/2)  # half height
        hw=int((kernel.shape[1]-1)/2)  # half width
        min_valid=(1-max_missing)*kernel.shape[0]*kernel.shape[1]

        # dont forget to flip the kernel
        kernel_flip=kernel[::-1,::-1]

        result=sciconvolve(slab2,kernel,mode='constant',cval=0.)
        slab2=numpy.where(slabmask==1,0,slab2)

        #------------------Get nan holes------------------
        miss_idx=zip(*numpy.where(slabmask==1))

        if missing_as=='mask':
            mask=numpy.zeros([H,W])

        for yii,xii in miss_idx:

            #-------Recompute at each new nan in result-------
            hole_ys=range(max(0,yii-hh),min(H,yii+hh+1))
            hole_xs=range(max(0,xii-hw),min(W,xii+hw+1))

            for hi in hole_ys:
                for hj in hole_xs:
                    hi1=max(0,hi-hh)
                    hi2=min(H,hi+hh+1)
                    hj1=max(0,hj-hw)
                    hj2=min(W,hj+hw+1)

                    slab_window=slab2[hi1:hi2,hj1:hj2]
                    mask_window=slabmask[hi1:hi2,hj1:hj2]
                    kernel_ij=kernel_flip[max(0,hh-hi):min(hh*2+1,hh+H-hi), 
                                     max(0,hw-hj):min(hw*2+1,hw+W-hj)]
                    kernel_ij=numpy.where(mask_window==1,0,kernel_ij)

                    #----Fill with missing if not enough valid data----
                    ksum=numpy.sum(kernel_ij)
                    if ksum<min_valid:
                        if missing_as=='nan':
                            result[hi,hj]=numpy.nan
                        elif missing_as=='mask':
                            result[hi,hj]=0.
                            mask[hi,hj]=True
                    else:
                        result[hi,hj]=numpy.sum(slab_window*kernel_ij)

        if missing_as=='mask':
            result=numpy.ma.array(result)
            result.mask=mask

    return result

Below is a figure demonstrating the output. On the left is a 30x30 random map with 3 numpy.nan holes with sizes of:

  1. 1x1
  2. 3x3
  3. 5x5

enter image description here

On the right is the convolved output, by a 5x5 kernel (with all 1s), and a tolerance level of 50% (max_missing=0.5).

So the first 2 smaller holes are filled using nearby values, and in the last one, because the number of missings > 0.5x5x5 = 12.5, numpy.nans are placed to represent missing information.

Jason
  • 2,950
  • 2
  • 30
  • 50
  • Please also check out this repo (https://github.com/Xunius/Random-Fortran-scripts) where I created a python module compiled from Fortran, which would speed up things a lot. – Jason Jan 16 '17 at 09:15
  • And I updated my own solution, at here https://github.com/Xunius/python_convolution – Jason Nov 08 '17 at 09:38
  • I have two questions to @Jason 's solution: could you please post the code how you created/used your fxn? Why are you not using convolve2d and how could I use its properties like different modes, eg 'valid', ie not to keep the same shape of the image? – My Work Apr 12 '20 at 07:34
5

I found a hack. instead of nan use imaginary number (where it is nan change it to 1i) run the convolution and set that wherever the imaginary value is above a threshold it is nan. whenever it is bellow just take the real value. here is a code snippet:

frames_complex = np.zeros_like(frames_, dtype=np.complex64)
frames_complex[np.isnan(frames_)] = np.array((1j))
frames_complex[np.bitwise_not(np.isnan(frames_))] =                         
frames_[np.bitwise_not(np.isnan(frames_))]
convolution = signal.convolve(frames_complex, gaussian_window, 'valid')
convolution[np.imag(convolution)>0.2] = np.nan
convolution = convolution.astype(np.float32)
3

Base on the idea from Ilan Schvartzman in a previous answer here an improved version. In addition, it can compensate that missing values are set to 0 (in real space) and it supports normalization to np.sum(in2). Both are adjustable with the parameters correct_missing, and norm, respectively. For a 1d version, simply replace scipy.signal.convolve2d with scipy.signal.convolve.

import scipy.signal
import numpy as np

def masked_convolve2d(in1, in2, correct_missing=True, norm=True, valid_ratio=1./3., *args, **kwargs):
    """A workaround for np.ma.MaskedArray in scipy.signal.convolve. 
    It converts the masked values to complex values=1j. The complex space allows to set a limit
    for the imaginary convolution. The function use a ratio `valid_ratio` of np.sum(in2) to 
    set a lower limit on the imaginary part to mask the values.
    I.e. in1=[[1.,1.,--,--]] in2=[[1.,1.]] -> imaginary_part/sum(in2): [[1., 1., .5, 0.]]
    -> valid_ratio=.5 -> out:[[1., 1., .5, --]].
    PARAMETERS
    ---------
    in1 : array_like
        First input.
    in2 : array_like
        Second input. Should have the same number of dimensions as `in1`.
    correct_missing : bool, optional
        correct the value of the convolution as a sum over valid data only, 
        as masked values account 0 in the real space of the convolution.
    norm : bool, optional
        if the output should be normalized to np.sum(in2).
    valid_ratio: float, optional
        the upper limit of the imaginary convolution to mask values. Defined by the ratio of np.sum(in2).
    *args, **kwargs: optional
        parsed to scipy.signal.convolve(..., *args, **kwargs)
    """
    if not isinstance(in1, np.ma.MaskedArray):
        in1 = np.ma.array(in1)
    
    # np.complex128 -> stores real as np.float64
    con = scipy.signal.convolve2d(in1.astype(np.complex128).filled(fill_value=1j), 
                                  in2.astype(np.complex128), 
                                  *args, **kwargs
                                 )
    
    # split complex128 to two float64s
    con_imag = con.imag
    con = con.real
    mask = np.abs(con_imag/np.sum(in2)) > valid_ratio
    
    # con_east.real / (1. - con_east.imag): correction, to get the mean over all valid values
    # con_east.imag > percent: how many percent of the single convolution value have to be from valid values
    if correct_missing:
        correction = np.sum(in2) - con_imag
        con[correction!=0] *= np.sum(in2)/correction[correction!=0]
        
    if norm:
        con /= np.sum(in2)
        
    return np.ma.array(con, mask=mask)

Example

An Example to show the difference for correct_missing with the inputs:

in1 = np.ones((1, 6))
in1[:, 4:] = 0
in1 = np.ma.masked_equal(in1, 0)

in2 = np.ones((1, 4))

in1
>>> array([[1.0, 1.0, 1.0, 1.0, --, --]])

The masked convolution with correct_missing is:

masked_convolve2d(in1, in2, correct_missing=True, mode='valid', norm=True)
>>> masked_array(data=[[1.0, 1.0, --]],
                 mask=[[False, False,  True]])

Without correction and if you want to fill the masked values with np.nan:

b = masked_convolve2d(in1, in2, correct_missing=False, mode='valid', norm=True)
b.filled(np.nan)
>>> array([[1.  , 0.75,  nan]]))

Performance test

I tested my version (masked_convolve2d) against Jason's code (convolve2d) based on the inputs:

in1 = np.ones((10,6))
in1[:, 4:] = 0
in1 = np.ma.masked_equal(in1, 0)

in2 = np.ones((3,3))  # <kernel> shape needs to be an odd number for Jason's code

with the following results on my machine:

%timeit -n 10 -r 10 convolve2d(in1, in2)
>>> 2.44 ms ± 117 µs per loop (mean ± std. dev. of 10 runs, 10 loops each)

%timeit -n 10 -r 10 masked_convolve2d(in1, in2)
>>> 131 µs ± 19.1 µs per loop (mean ± std. dev. of 10 runs, 10 loops each)
Andrew
  • 817
  • 4
  • 9
  • ALMOST what I need! However, I wonder how to make it work with a kernel containing negative entries (e.g. when you have a gradient kernel [1, 0, -1]). In that case the normalization does not work. I thought perhaps using the sum of absolute values would help, but not really. Any ideas? – nicrie Jan 27 '23 at 15:41
  • You could split it up in two convolutions. First a normal convolution of your data and kernel filled with 0 at masked values. And a second, where you calculate the overlap of the kernel with valid data, e.g.: `masked_convolve2d(np.ones_like(data), np.ones_like(kernal))`. – Andrew Jan 27 '23 at 16:32