1

I am trying to implement a convolutional layer in Python using Numpy. The input is a 4-dimensional array of shape [N, H, W, C], where:

  • N: Batch size
  • H: Height of image
  • W: Width of image
  • C: Number of channels

The convolutional filter is also a 4-dimensional array of shape [F, F, Cin, Cout], where

  • F: Height and width of a square filter
  • Cin: Number of input channels (Cin = C)
  • Cout: Number of output channels

Assuming a stride of one along all axes, and no padding, the output should be a 4-dimensional array of shape [N, H - F + 1, W - F + 1, Cout].

My code is as follows:

import numpy as np

def conv2d(image, filter):
  # Height and width of output image
  Hout = image.shape[1] - filter.shape[0] + 1
  Wout = image.shape[2] - filter.shape[1] + 1

  output = np.zeros([image.shape[0], Hout, Wout, filter.shape[3]])

  for n in range(output.shape[0]):
    for i in range(output.shape[1]):
      for j in range(output.shape[2]):
        for cout in range(output.shape[3]):
          output[n,i,j,cout] = np.multiply(image[n, i:i+filter.shape[0], j:j+filter.shape[1], :], filter[:,:,:,cout]).sum()

  return output

This works perfectly, but uses four for loops and is extremely slow. Is there a better way of implementing a convolutional layer that takes 4-dimensional input and filter, and returns a 4-dimensional output, using Numpy?

Dr. Prasanna Date
  • 745
  • 1
  • 9
  • 29
  • I'm having dificulties tryind to reproduce this. Can you give a sample filter? Judging by this ,`filter.shape[3]`, is it 4dimensional? – yatu May 11 '19 at 15:44
  • The filter is 4 dimensional. A sample filter could be `filter = np.random.randint(0, 2, [5, 5, 3, 16])`. This would be a 5 X 5 filter that operates on a three channel input image and generates an output 'image' with 16 channels. – Dr. Prasanna Date May 11 '19 at 16:12
  • Okay, will give it a look when I have some time – yatu May 11 '19 at 17:07

1 Answers1

4

This a straightforward implementation of this kind of keras-like (?) convolution. It might be hard to understand for beginners because it uses a lot of broadcasting and stride tricks.

from numpy.lib.stride_tricks import as_strided
def conv2d(a, b):
    a = as_strided(a,(len(a),a.shape[1]-len(b)+1,a.shape[2]-b.shape[1]+1,len(b),b.shape[1],a.shape[3]),a.strides[:3]+a.strides[1:])
    return np.einsum('abcijk,ijkd', a, b[::-1,::-1])

BTW: if you are doing convolution with very-big kernel, use Fourier-based algorithm instead.

EDIT: The [::-1,::-1] should be removed in the case that convolution does not involve flipping the kernel first (like what's in tensorflow).

EDIT: np.tensordot(a, b, axes=3) performs much better than np.einsum("abcijk,ijkd", a, b), and is highly recommended. So, the function becomes:

from numpy.lib.stride_tricks import as_strided

def conv2d(a, b):
  Hout = a.shape[1] - b.shape[0] + 1
  Wout = a.shape[2] - b.shape[1] + 1

  a = as_strided(a, (a.shape[0], Hout, Wout, b.shape[0], b.shape[1], a.shape[3]), a.strides[:3] + a.strides[1:])

  return np.tensordot(a, b, axes=3)
Dr. Prasanna Date
  • 745
  • 1
  • 9
  • 29
ZisIsNotZis
  • 1,570
  • 1
  • 13
  • 30
  • Thank you for the answer. However, this does not seem to produce the correct result. For example, I compared it with `tensorflow.conv2d(a, b, strides=[1,1,1,1], padding="VALID")`, and the results are different. The output tensors have the same shape, but the values are different. Any idea why that could be the case? – Dr. Prasanna Date May 13 '19 at 23:19
  • I found the problem. It seems to be related to the definition of **convolve**. For `[1,2,3]` convolving with `[1,2,3]`, in my understanding, it should be `1*3+2*2+3*1` (otherwise it's called correlate). But tensorflow seems to have defined convolution as `1*1+2*2+3*3`. Anyway, to get the result of tensorflow, simply remove the `[::-1,::-1]` at last line. – ZisIsNotZis May 14 '19 at 02:35
  • The method seemed interesting, so evaluated the performance of all 3 variants. Here is the result. TF using CPU only. https://gist.github.com/prabindh/beea050b379e6e9a52699b66d7ce227f – Prabindh May 14 '19 at 05:19
  • @Prabindh Yeah, generally `numpy` should not be slower than `tensorflow` on same cpu under similar algorithm implementation, and this implementation is basically equivalent to the psudocode in `tf.nn.conv2d` documentation: generate a virtual matrix in no time (except here is a 6-tensor), and do right multiplication (in `einsum`). You can fix the result different problem by removing the `[::-1,::-1]` – ZisIsNotZis May 14 '19 at 05:48
  • Thank you for your contributions @ZisIsNotZis and @Prabindh. Would suggest using `np.tensordot(a, b, axes=3)` instead of `np.einsum("abcijk,ijkd", a, b)`. I noticed the former works much faster than the latter, especially on increasingly larger datasets. – Dr. Prasanna Date May 14 '19 at 21:28
  • Is it possible to upgrade this method to be able to use strides? – Nihar Karve May 09 '20 at 12:36
  • Also very interested in this - How can one change this to allow for strides? – EliteKaffee Dec 26 '20 at 16:47