0

I am building a vanilla neural network from scratch using NumPy and trialling the model performance for different activation functions. I am especially keen to see how the 'Maxout' activation function would effect my model performance.

After doing some search, I was not able to find an implementation in NumPy except for their definition (https://ibb.co/kXCpjKc). The formula for forward propagation is clear where I would take the max(Z) (where Z = w.T * x + b). But, their derivative that I will be using in backpropogation is not clear to me.

What does j = argmax(z) mean in this context? How do I implement it in NumPy?

Any help would be much appreciated! Thank you!

Abishek
  • 767
  • 5
  • 9

1 Answers1

1

Changing any of the non maximum values slightly does not affect the output, so their gradient is zero. The gradient is passed from the next layer to only the neuron that achieved the max (gradient = 1 in the link you provided). See this stackoverflow answer: https://datascience.stackexchange.com/a/11703.

In a neural network setting you would need the gradient with respect to every of the x_i, so you would need the full derivative. In the link you provided you can see there is only a partial derivative defined. The partial derivative is a vector (of almost all zeros and 1 where the neuron is maximum), so the full gradient will become a matrix.

You can implement this in numpy using np.argmax.

Frederik Bode
  • 2,632
  • 1
  • 10
  • 17