16

I tried to implement strided convolution of a 2D array using for loop i.e

arr = np.array([[2,3,7,4,6,2,9],
                [6,6,9,8,7,4,3],
                [3,4,8,3,8,9,7],
                [7,8,3,6,6,3,4],
                [4,2,1,8,3,4,6],
                [3,2,4,1,9,8,3],
                [0,1,3,9,2,1,4]])

arr2 = np.array([[3,4,4],
                 [1,0,2],
                 [-1,0,3]])

def stride_conv(arr1,arr2,s,p):
    beg = 0
    end = arr2.shape[0]
    final = []
    for i in range(0,arr1.shape[0]-1,s):
        k = []
        for j in range(0,arr1.shape[0]-1,s):
            k.append(np.sum(arr1[beg+i : end+i, beg+j:end+j] * (arr2)))
        final.append(k)

    return np.array(final)

stride_conv(arr,arr2,2,0)

This results in 3*3 array:

array([[ 91, 100,  88],
       [ 69,  91, 117],
       [ 44,  72,  74]])

Is there a numpy function or scipy function to do the same? My approach is not that good. How can I vectorize this?

Bharath M Shetty
  • 30,075
  • 6
  • 57
  • 108
  • And you can't use scipy 2D conv? – Divakar Jan 04 '18 at 15:01
  • 2
    @Divakar I just learnt about convolutions yesterday, scipy 2D conv has no parameter for number of strides, if so I'm afraid I might needed a bit more research. – Bharath M Shetty Jan 04 '18 at 15:03
  • 1
    And arg `p` is not used? – Divakar Jan 04 '18 at 15:06
  • 1
    @Divakar its incomplete, I wanted to use padding, all I learnt was doing it without padding. I just kept the parameter for future updation – Bharath M Shetty Jan 04 '18 at 15:07
  • 1
    Possible [this blog post](https://wiseodd.github.io/techblog/2016/07/16/convnet-conv-layer/) can help you out. Citing: _"Let’s say we have a single image of `1x1x10x10` size and a single filter of `1x1x3x3`. ... Then, naively, if we’re going to do convolution operation for our filter on the image, we will loop over the image, and take the dot product at each ..."_ & _"But, what if we don’t want to do the loop? ... What we need is to gather all the possible locations that we can apply our filter at, then do a single matrix multiplication to get the dot product at each of those possible locs."_. – dfrib Jan 04 '18 at 15:27

6 Answers6

12

Ignoring the padding argument and trailing windows that won't have enough lengths for convolution against the second array, here's one way with np.lib.stride_tricks.as_strided -

def strided4D(arr,arr2,s):
    strided = np.lib.stride_tricks.as_strided
    s0,s1 = arr.strides
    m1,n1 = arr.shape
    m2,n2 = arr2.shape    
    out_shp = (1+(m1-m2)//s, m2, 1+(n1-n2)//s, n2)
    return strided(arr, shape=out_shp, strides=(s*s0,s*s1,s0,s1))

def stride_conv_strided(arr,arr2,s):
    arr4D = strided4D(arr,arr2,s=s)
    return np.tensordot(arr4D, arr2, axes=((2,3),(0,1)))

Alternatively, we can use the scikit-image built-in view_as_windows to get those windows elegantly, like so -

from skimage.util.shape import view_as_windows

def strided4D_v2(arr,arr2,s):
    return view_as_windows(arr, arr2.shape, step=s)
Divakar
  • 218,885
  • 19
  • 262
  • 358
  • 1
    Strided4D is so beautiful, sir a small explanation of your approach would be really helpful. – Bharath M Shetty Jan 04 '18 at 15:37
  • Sir the out_shp of strided4D, is that a formula of some kind? – Bharath M Shetty Jan 04 '18 at 15:41
  • 1
    @Dark Yeah it's usually `1+(m1-m2)`. Incorporating the stepsize, we need that division. Think of it, like we are going m2 backwards along the length of `m1` and then seeing how many windows can fit in from the start given that stepsize. – Divakar Jan 04 '18 at 15:49
  • There's so much to understand, I have to analyse every variable and every operation. Thank you – Bharath M Shetty Jan 04 '18 at 15:51
  • @Dark That scikit version does everything for you, so that could be the easiest way. – Divakar Jan 04 '18 at 15:52
  • Sir I chose numpy so I can understand the core concepts. I will use scikit once I completely understood the topic. – Bharath M Shetty Jan 04 '18 at 15:53
  • @Divakar is there a reason for using np.tensordot rather than c = (np.sum(arr4d * arr2, axis=(2, 3))) is it because your step is 2x2? – NaN Jan 04 '18 at 17:35
  • @NaN Yeah `tensordot` leverages BLAS, which should be better than any sum based method. – Divakar Jan 04 '18 at 19:31
  • Is it a convention to use `(1+(m1-m2)//s, m2, 1+(n1-n2)//s, n2)`, instead of `(1+(m1-m2)//s, 1+(n1-n2)//s, m2, n2)`? – Jason Feb 26 '18 at 13:52
  • @Jason From what I remember, that output shape depends on how we are striding, i.e. the input to `strides` argument there and these two depend on how we would like to get the sliding windows. So, that's the only rule in it. – Divakar Feb 26 '18 at 13:58
  • @Divakar, so if I understand you correctly, `(1+(m1-m2)//s, m2, 1+(n1-n2)//s, n2)` should be paired with `strides=(s*s0,s0,s*s1,s1)`, and `(1+(m1-m2)//s, 1+(n1-n2)//s, m2, n2)` be paired with `strides=(s*s0,s*s1,s0,s1)`? I tested on a matrix and the 1st top-left windows doesn't match according to your code. – Jason Feb 26 '18 at 14:10
  • @Jason The windows would be obtained in a different format. That's what I meant when I mentioned - `and these two depend on how we would like to get the sliding windows`. With those new changes, to get the windows as listed in the given approach, we would need to swap axes - `.swapaxes(1,2)` or specify corresponding different axes in the tensordot later on. – Divakar Feb 26 '18 at 14:30
  • 4
    @Divakar But your code doesn't work on shapes other than `arr=7x7` (this happen to give `1+(m1-m2)//s = 3 = kernel size`), sometimes `arr4D` would even have `nan`s in it. Changing to `(1+(m1-m2)//s, 1+(n1-n2)//s, m2, n2)` gives the correct match of tensordot axes (`(2,3),(0,1)`). I think @NaN also meant to ask about this because his `np.sum()` approach also works now. – Jason Mar 02 '18 at 11:54
  • This code is wrong. See Jason's comment above if you want this to work in the general case – Nihar Karve May 09 '20 at 16:57
8

I think we can do a "valid" fft convolution and pick out only those results at strided locations, like this:

def strideConv(arr,arr2,s):
    cc=scipy.signal.fftconvolve(arr,arr2[::-1,::-1],mode='valid')
    idx=(np.arange(0,cc.shape[1],s), np.arange(0,cc.shape[0],s))
    xidx,yidx=np.meshgrid(*idx)
    return cc[yidx,xidx]

This gives same results as other people's answers. But I guess this only works if the kernel size is odd numbered.

Also I've flipped the kernel in arr2[::-1,::-1] just to stay consistent with others, you may want to omit it depending on context.

UPDATE:

We currently have a few different ways of doing 2D or 3D convolution using numpy and scipy alone, and I thought about doing some comparisons to give some idea on which one is faster on data of different sizes. I hope this won't be regarded as off-topic.

Method 1: FFT convolution (using scipy.signal.fftconvolve):

def padArray(var,pad,method=1):
    if method==1:
        var_pad=numpy.zeros(tuple(2*pad+numpy.array(var.shape[:2]))+var.shape[2:])
        var_pad[pad:-pad,pad:-pad]=var
    else:
        var_pad=numpy.pad(var,([pad,pad],[pad,pad])+([0,0],)*(numpy.ndim(var)-2),
                mode='constant',constant_values=0)
    return var_pad

def conv3D(var,kernel,stride=1,pad=0,pad_method=1):
    '''3D convolution using scipy.signal.convolve.
    '''
    var_ndim=numpy.ndim(var)
    kernel_ndim=numpy.ndim(kernel)
    stride=int(stride)

    if var_ndim<2 or var_ndim>3 or kernel_ndim<2 or kernel_ndim>3:
        raise Exception("<var> and <kernel> dimension should be in 2 or 3.")

    if var_ndim==2 and kernel_ndim==3:
        raise Exception("<kernel> dimension > <var>.")

    if var_ndim==3 and kernel_ndim==2:
        kernel=numpy.repeat(kernel[:,:,None],var.shape[2],axis=2)

    if pad>0:
        var_pad=padArray(var,pad,pad_method)
    else:
        var_pad=var

    conv=fftconvolve(var_pad,kernel,mode='valid')

    if stride>1:
        conv=conv[::stride,::stride,...]

    return conv

Method 2: Special conv (see this anwser):

def conv3D2(var,kernel,stride=1,pad=0):
    '''3D convolution by sub-matrix summing.
    '''
    var_ndim=numpy.ndim(var)
    ny,nx=var.shape[:2]
    ky,kx=kernel.shape[:2]

    result=0

    if pad>0:
        var_pad=padArray(var,pad,1)
    else:
        var_pad=var

    for ii in range(ky*kx):
        yi,xi=divmod(ii,kx)
        slabii=var_pad[yi:2*pad+ny-ky+yi+1:1, xi:2*pad+nx-kx+xi+1:1,...]*kernel[yi,xi]
        if var_ndim==3:
            slabii=slabii.sum(axis=-1)
        result+=slabii

    if stride>1:
        result=result[::stride,::stride,...]

    return result

Method 3: Strided-view conv, as suggested by Divakar:

def asStride(arr,sub_shape,stride):
    '''Get a strided sub-matrices view of an ndarray.

    <arr>: ndarray of rank 2.
    <sub_shape>: tuple of length 2, window size: (ny, nx).
    <stride>: int, stride of windows.

    Return <subs>: strided window view.

    See also skimage.util.shape.view_as_windows()
    '''
    s0,s1=arr.strides[:2]
    m1,n1=arr.shape[:2]
    m2,n2=sub_shape[:2]

    view_shape=(1+(m1-m2)//stride,1+(n1-n2)//stride,m2,n2)+arr.shape[2:]
    strides=(stride*s0,stride*s1,s0,s1)+arr.strides[2:]
    subs=numpy.lib.stride_tricks.as_strided(arr,view_shape,strides=strides)

    return subs

def conv3D3(var,kernel,stride=1,pad=0):
    '''3D convolution by strided view.
    '''
    var_ndim=numpy.ndim(var)
    kernel_ndim=numpy.ndim(kernel)

    if var_ndim<2 or var_ndim>3 or kernel_ndim<2 or kernel_ndim>3:
        raise Exception("<var> and <kernel> dimension should be in 2 or 3.")

    if var_ndim==2 and kernel_ndim==3:
        raise Exception("<kernel> dimension > <var>.")

    if var_ndim==3 and kernel_ndim==2:
        kernel=numpy.repeat(kernel[:,:,None],var.shape[2],axis=2)

    if pad>0:
        var_pad=padArray(var,pad,1)
    else:
        var_pad=var

    view=asStride(var_pad,kernel.shape,stride)
    #return numpy.tensordot(aa,kernel,axes=((2,3),(0,1)))
    if numpy.ndim(kernel)==2:
        conv=numpy.sum(view*kernel,axis=(2,3))
    else:
        conv=numpy.sum(view*kernel,axis=(2,3,4))

    return conv

I did 3 sets of comparisons:

  1. convolution on 2D data, with different input size and different kernel size, stride=1, pad=0. Results below (color as time used for convolution repeated for 10 times):

enter image description here

So "FFT conv" is in general the fastest. "Special conv" and "Stride-view conv" get slow as kernel size increases, but decreases again as it approaches the size of input data. The last subplot shows the fastest method, so the big triangle of purple indicates FFT being the winner, but note there is a thin green column on the left side (probably too small to see, but it's there), suggesting that "Special conv" has advantage for very small kernels (smaller than about 5x5). And when kernel size approaches input, "stride-view conv" is fastest (see the diagonal line).

Comparison 2: convolution on 3D data.

Setup: pad=0, stride=2, input dimension=nxnx5, kernel shape=fxfx5.

I skipped computations of "Special Conv" and "Stride-view conv" when kernel size is in the mid of input. Basically "Special Conv" shows no advantage now, and "Stride-view" is faster than FFT for both small and large kernels.

enter image description here

One additional note: when sizes goes above 350, I notice considerable memory usage peaks for the "Stride-view conv".

Comparison 3: convolution on 3D data with larger stride.

Setup: pad=0, stride=5, input dimension=nxnx10, kernel shape=fxfx10.

This time I omitted the "Special Conv". For a larger area "Stride-view conv" surpasses FFT, and last subplots shows that the difference approaches 100 %. Probably because as the stride goes up, the FFT approach will have more wasted numbers so the "stride-view" gains more advantages for small and large kernels.

enter image description here

Jason
  • 2,950
  • 2
  • 30
  • 50
8

How about using signal.convolve2d from scipy?

My approach is similar to Jason's one but using indexing.

def strideConv(arr, arr2, s):
    return signal.convolve2d(arr, arr2[::-1, ::-1], mode='valid')[::s, ::s]

Note that the kernal has to be reversed. For details, please see discussion here and here. Otherwise use signal.correlate2d.

Examples:

 >>> strideConv(arr, arr2, 1)
 array([[ 91,  80, 100,  84,  88],
        [ 99, 106, 126,  92,  77],
        [ 69,  98,  91,  93, 117],
        [ 80,  79,  87,  93,  61],
        [ 44,  72,  72,  63,  74]])
 >>> strideConv(arr, arr2, 2)
 array([[ 91, 100,  88],
        [ 69,  91, 117],
        [ 44,  72,  74]])
pe-perry
  • 2,591
  • 2
  • 22
  • 33
  • 6
    Is there anyway to do this without wasted calculations? Like in this example, a bunch of useless calculations are done between the required strides – Recessive Feb 07 '20 at 03:27
4

Here is an O(N^d (log N)^d) fft-based approach. The idea is to chop up both operands into strides-spaced grids at all offsets modulo strides, do the conventional fft convolution between grids of corresponding offsets and then pointwise sum the results. It is a bit index-heavy but I'm afraid that can't be helped:

import numpy as np
from numpy.fft import fftn, ifftn

def strided_conv_2d(x, y, strides):
    s, t = strides
    # consensus dtype
    cdt = (x[0, 0, ...] + y[0, 0, ...]).dtype
    xi, xj = x.shape
    yi, yj = y.shape
    # round up modulo strides
    xk, xl, yk, yl = map(lambda a, b: -a//b * -b, (xi,xj,yi,yj), (s,t,s,t))
    # zero pad to avoid circular convolution
    xp, yp = (np.zeros((xk+yk, xl+yl), dtype=cdt) for i in range(2))
    xp[:xi, :xj] = x
    yp[:yi, :yj] = y
    # fold out strides
    xp = xp.reshape((xk+yk)//s, s, (xl+yl)//t, t)
    yp = yp.reshape((xk+yk)//s, s, (xl+yl)//t, t)
    # do conventional fft convolution
    xf = fftn(xp, axes=(0, 2))
    yf = fftn(yp, axes=(0, 2))
    result = ifftn(xf * yf.conj(), axes=(0, 2)).sum(axis=(1, 3))
    # restore dtype
    if cdt in (int, np.int_, np.int64, np.int32):
        result = result.real.round()
    return result.astype(cdt)

arr = np.array([[2,3,7,4,6,2,9],
                [6,6,9,8,7,4,3],
                [3,4,8,3,8,9,7],
                [7,8,3,6,6,3,4],
                [4,2,1,8,3,4,6],
                [3,2,4,1,9,8,3],
                [0,1,3,9,2,1,4]])

arr2 = np.array([[3,4,4],
                 [1,0,2],
                 [-1,0,3]])

print(strided_conv_2d(arr, arr2, (2, 2)))

Result:

[[ 91 100  88  23   0  29]
 [ 69  91 117  19   0  38]
 [ 44  72  74  17   0  22]
 [ 16  53  26  12   0   0]
 [  0   0   0   0   0   0]
 [ 19  11  21  -9   0   6]]
Paul Panzer
  • 51,835
  • 3
  • 54
  • 99
1

As far as I know, there is no direct implementation of convolution filter in numpy or scipy that supports stride and padding so I think it's better to use a DL package such as torch or tensorflow, then cast the final result to numpy. a torch implementation might be:

import torch
import torch.nn.functional as F

arr = torch.tensor(np.expand_dims(arr, axis=(0,1))
arr2 = torch.tensor(np.expand_dims(arr2, axis=(0,1))
output = F.conv2d(arr, arr2, stride=2, padding=0)
output = output.numpy().squeeze()

output>
array([[ 91, 100,  88],
       [ 69,  91, 117],
       [ 44,  72,  74]])
Sajad.sni
  • 187
  • 4
  • 15
0

Convolution which supports strides and dilation. numpy.lib.stride_tricks.as_strided is used.

import numpy as np
from numpy.lib.stride_tricks import as_strided

def conv_view(X, F_s, dr, std):
    X_s = np.array(X.shape)
    F_s = np.array(F_s)
    dr = np.array(dr)
    Fd_s = (F_s - 1) * dr + 1
    if np.any(Fd_s > X_s):
        raise ValueError('(Dilated) filter size must be smaller than X')
    std = np.array(std)
    X_ss = np.array(X.strides)
    Xn_s = (X_s - Fd_s) // std + 1
    Xv_s = np.append(Xn_s, F_s)
    Xv_ss = np.tile(X_ss, 2) * np.append(std, dr)
    return as_strided(X, Xv_s, Xv_ss, writeable=False)

def convolve_stride(X, F, dr=None, std=None):
    if dr is None:
        dr = np.ones(X.ndim, dtype=int)
    if std is None:
        std = np.ones(X.ndim, dtype=int)
    if not (X.ndim == F.ndim == len(dr) == len(std)):
        raise ValueError('X.ndim, F.ndim, len(dr), len(std) must be the same')
    Xv = conv_view(X, F.shape, dr, std)
    return np.tensordot(Xv, F, axes=X.ndim)
lovetl2002
  • 910
  • 9
  • 23