ND Convolution Backprogation

Question

For my education, I am trying to implement an N-dimensional convolutional layer in a convolutional neural network.

I would like to implement a backpropagation function. However, I am not sure of the most efficient way of doing so.

At present, I am using signal.fftconvolve to:

In the forwards step, convolve the filter and kernel forwards over all filters;
In the Backpropagation step, convolve the derivatives (reversed in all dimensions with the FlipAllAxes function) with the array (https://jefkine.com/general/2016/09/05/backpropagation-in-convolutional-neural-networks/) over all filters and sum them. The output I take to be the sum of each image convolved with each derivative for each filter.

I am particularly confused about how to convolve the derivatives. Using the class below to backpropagate results in an explosion in the size of the weights.

What is the correct way to program the convolution of the derivative with the output and filters?

EDIT:

According to this paper (Fast Training of Convolutional Networks through FFTs), which seeks to do exactly what I wish to do:

The derivatives for the previous layer are given by the convolution of the derivatives of the current layer with the weights:

dL/dy_f = dL/dx * w_f^T
The derivative for the weights are the piecewise sum of the convolution of the derivatives with the original input:

dL/dy = dL/dx * x

I have implemented, as best as I know how, this below. However, this does not seem to give the intended result, as the network I have written using this layer exhibits wild fluctuations during training.

    import numpy as np
    from scipy import signal

    class ConvNDLayer:
        def __init__(self,channels, kernel_size, dim):

            self.channels = channels
            self.kernel_size = kernel_size;
            self.dim = dim

            self.last_input = None

            self.filt_dims = np.ones(dim+1).astype(int)
            self.filt_dims[1:] =  self.filt_dims[1:]*kernel_size
            self.filt_dims[0]= self.filt_dims[0]*channels 
            self.filters = np.random.randn(*self.filt_dims)/(kernel_size)**dim


        def FlipAllAxes(self, array):

            sl = slice(None,None,-1)
            return array[tuple([sl]*array.ndim)] 

        def ViewAsWindows(self, array, window_shape, step=1):
             # -- basic checks on arguments
             if not isinstance(array, cp.ndarray):
                 raise TypeError("`array` must be a Cupy ndarray")
             ndim = array.ndim
             if isinstance(window_shape, numbers.Number):
                  window_shape = (window_shape,) * ndim
             if not (len(window_shape) == ndim):
                   raise ValueError("`window_shape` is incompatible with `arr_in.shape`")

             if isinstance(step, numbers.Number):
                  if step < 1:
                  raise ValueError("`step` must be >= 1")
                  step = (step,) * ndim
             if len(step) != ndim:
                   raise ValueError("`step` is incompatible with `arr_in.shape`")

              arr_shape = array.shape
              window_shape = np.asarray(window_shape, dtype=arr_shape.dtype))

              if ((arr_shape - window_shape) < 0).any():
                   raise ValueError("`window_shape` is too large")

              if ((window_shape - 1) < 0).any():
                    raise ValueError("`window_shape` is too small")

               # -- build rolling window view
                    slices = tuple(slice(None, None, st) for st in step)
                    window_strides = array.strides
                    indexing_strides = array[slices].strides
                    win_indices_shape = (((array.shape -window_shape)
                    // step) + 1)

                 new_shape = tuple(list(win_indices_shape) + list(window_shape))
                 strides = tuple(list(indexing_strides) + list(window_strides))

                  arr_out = as_strided(array, shape=new_shape, strides=strides)

                  return arr_out

        def UnrollAxis(self, array, axis):
             # This so it works with a single dimension or a sequence of them
             axis = cp.asnumpy(cp.atleast_1d(axis))
             axis2 = cp.asnumpy(range(len(axis)))

             # Put unrolled axes at the beginning
             array = cp.moveaxis(array, axis,axis2)
             # Unroll
             return array.reshape((-1,) + array.shape[len(axis):])

        def Forward(self, array):

             output_shape =cp.zeros(array.ndim + 1)    
             output_shape[1:] =  cp.asarray(array.shape)
             output_shape[0]= self.channels 
             output_shape = output_shape.astype(int)
             output = cp.zeros(cp.asnumpy(output_shape))

             self.last_input = array

             for i, kernel in enumerate(self.filters):
                    conv = self.Convolve(array, kernel)
                    output[i] = conv

             return output


        def Backprop(self, d_L_d_out, learn_rate):

            d_A= cp.zeros_like(self.last_input)
            d_W = cp.zeros_like(self.filters)


           for i, (kernel, d_L_d_out_f) in enumerate(zip(self.filters, d_L_d_out)):

                d_A += signal.fftconvolve(d_L_d_out_f, kernel.T, "same")
                conv = signal.fftconvolve(d_L_d_out_f, self.last_input, "same")
                conv = self.ViewAsWindows(conv, kernel.shape)
                axes = np.arange(kernel.ndim)
                conv = self.UnrollAxis(conv, axes)  
                d_W[i] = np.sum(conv, axis=0)


           output = d_A*learn_rate
           self.filters =  self.filters - d_W*learn_rate
           return output

score 0 · Answer 1 · answered Mar 02 '20 at 08:05

Multiplying gradients with learn_rate is not usually enough.

For better performance and reducing heavy fluctuations, the gradients are scaled using optimizers by methods such as dividing by past few gradients(RMSprop).

The updates also depend on error, if you pass error for every sample individually, that usually creates noise, so it is considered better to average over multiple samples(mini-batches).

ND Convolution Backprogation

1 Answers1