I am building a vanilla neural network from scratch using NumPy and trialling the model performance for different activation functions. I am especially keen to see how the 'Maxout' activation function would effect my model performance.
After doing some search, I was not able to find an implementation in NumPy except for their definition (https://ibb.co/kXCpjKc). The formula for forward propagation is clear where I would take the max(Z) (where Z = w.T * x + b). But, their derivative that I will be using in backpropogation is not clear to me.
What does j = argmax(z) mean in this context? How do I implement it in NumPy?
Any help would be much appreciated! Thank you!