Incorrect Shape in Batch Normalization implemented on CNN using Numpy

Question

I am a newbie in Batch Normalization and after some self-research, I am trying to implemented it with the help of ChatGPT. As it is in CNN, the input are 4-D array whose shape represented (batch size, height, width, channel). I initialize the scaling factor G and shifting factor B both as (1,1,1,channel) numpy array. However when I backpropagation, the derivative with respect to G and B are both in the shape (batch size, height, width, channel). When I broadcast it, the G and B turn into a (batch size, height, width, channel) shape. In which situation am I doing wrong.

Also I look at this answer regarding Batch Normalization and it seem the way I standardize it differ from it. Can you please check the implementation and help on my mistake?

Here is my code:

epsilon = 1e-5

for i in range(5):

Z1 = Conv1.convolve(example)
A1 = NN.activation("relu", Z1)

mean1 = np.mean(A1, axis=(0, 1, 2), keepdims=True) 
var1 = np.var(A1,axis=(0,1,2),keepdims=True) + epsilon
std1 = np.sqrt(var1)

M1 = (A1 - mean1) / (std1 + epsilon)

N1 = M1 * G1 + B1

if i == 0:
    Conv2.initialize(N1)   

Z2 = Conv2.convolve(N1)
A2 = NN.activation("relu",Z2)

mean2 = np.mean(A2, axis=(0, 1, 2), keepdims=True)  # Shape: (1, 1, 1, num_channels)
var2 = np.var(A2,axis=(0,1,2),keepdims=True) + epsilon
std2 = np.sqrt(var2)  # Shape: (1, 1, 1, num_channels)

M2 = (A2 - mean2) / (std2)

N2 = M2 * G2 + B2
N3 = N2.reshape(N2.shape[0],-1).T
pred = NN.forward_propagation(N3,"v",return_=True)

# BACKPROPAGATION

dZ = NN.back_propagtaion(activation="Sigmoid",activation_prev=NN.parameter["A0"]
                ,m=20,layer=1,Y_true=yy,activation_cur=NN.parameter["A1"])

dNN = np.dot(dZ,NN.Weight[1])
dP = dNN.reshape(N2.shape)

# Derivative of scaling and shifting factor
dG2 = dP * M2
dB2 = np.sum(dP,axis=(0,1,2))

dZ2 = dP * G2

m = A2.shape[0]

dsigma2 = dZ2 * (A2 - mean2) * (-1 / (std2**2))
dmu2 = -np.sum(dZ2 * G2 / std2, axis=(0, 1, 2)) + dsigma2 * np.sum(-2 * (A2 - mean2), axis=(0, 1, 2)) / m
dback2 = dZ2 * G2 / std2 + dsigma2 * 2 * (A2 - mean2) / m + dmu2 / m

d_act2 = NN.activation_derivative("relu",Z2)
dZ2 = dback2 * d_act2
dK2,dW2,db2 = Conv2.backprop(dZ2)

# Derivative of scaling and shifting factor
dG1 = dK2 * M1
dB1 = np.sum(dK2,axis=(0,1,2))

dZ1 = dK2 * G1

dsigma1 = dZ1 * (A1 - mean1) * (-1 / (std1**2))
dmu1 = -np.sum(dZ1 * G1 / std1, axis=(0, 1, 2)) + dsigma1 * np.sum(-2 * (A1 - mean1), axis=(0, 1, 2)) / m
dback1 = dZ1 * G1 / std1 + dsigma1 * 2 * (A1 - mean1) / m + dmu1 / m

d_act1 = NN.activation_derivative("relu",Z1)
dZ1 = dback1 * d_act1
dK1,dW1,db1 = Conv1.backprop(dZ1)

lr = 1e-3
Conv2.K -= lr * dW2
Conv2.b -= lr * db2
Conv1.K -= lr * dW1
Conv1.b -= lr * db1

G2 -= lr * dG2
B2 -= lr * dB2
G1 -= lr * dG1
B1 -= lr * dB1

Incorrect Shape in Batch Normalization implemented on CNN using Numpy

0 Answers0