I am trying to implement a convolutional layer in Python using Numpy.
The input is a 4-dimensional array of shape [N, H, W, C]
, where:
N
: Batch sizeH
: Height of imageW
: Width of imageC
: Number of channels
The convolutional filter is also a 4-dimensional array of shape [F, F, Cin, Cout]
, where
F
: Height and width of a square filterCin
: Number of input channels (Cin = C
)Cout
: Number of output channels
Assuming a stride of one along all axes, and no padding, the output should be a 4-dimensional array of shape [N, H - F + 1, W - F + 1, Cout]
.
My code is as follows:
import numpy as np
def conv2d(image, filter):
# Height and width of output image
Hout = image.shape[1] - filter.shape[0] + 1
Wout = image.shape[2] - filter.shape[1] + 1
output = np.zeros([image.shape[0], Hout, Wout, filter.shape[3]])
for n in range(output.shape[0]):
for i in range(output.shape[1]):
for j in range(output.shape[2]):
for cout in range(output.shape[3]):
output[n,i,j,cout] = np.multiply(image[n, i:i+filter.shape[0], j:j+filter.shape[1], :], filter[:,:,:,cout]).sum()
return output
This works perfectly, but uses four for loops and is extremely slow. Is there a better way of implementing a convolutional layer that takes 4-dimensional input and filter, and returns a 4-dimensional output, using Numpy?