The only example of maxout implementation in Theano is on this link. My understanding is that I use any activation function and then maxout is just a post processing of the hidden layer outputs.
I tried to apply this to my own HiddenLayer
class. Below is the class before maxout:
class HiddenLayer(object):
def __init__(self, rng, input, n_in, n_out, W=None, b=None, activation=T.tanh):
'''
Initialise the Hidden Layer
Parameters:
rng - random number generator
input - input values from the preceding layer
n_in - number of input nodes (number of nodes of the preceding layer)
n_out - number of output nodes (number of nodes of this hidden layer)
W - the Weights of the layer
b - the bias of the layer
activation - the activation function: T.tanh(), relu()
'''
self.input = input
W, b = self.init_weights(rng, n_in, n_out, W, b, activation) # initialise the wrights of a hidden layer
self.W = W; self.b = b;
lin_output = T.dot(input, self.W) + self.b
self.output = (lin_output if activation is None else activation(lin_output))
# parameters of the model
self.params = [self.W, self.b]
If I understood the link correctly, the class after maxout implementation should look as below. Is this correct? If not, could you point out which part I misunderstood?
class HiddenLayer(object):
def __init__(self, rng, input, n_in, n_out, W=None, b=None, activation=T.tanh, maxout=False):
'''
maxout - whether to apply maxout after the activation function
'''
self.input = input
W, b = self.init_weights(rng, n_in, n_out, W, b, activation) # initialise the wrights of a hidden layer
self.W = W; self.b = b;
lin_output = T.dot(input, self.W) + self.b
self.output = (lin_output if activation is None else activation(lin_output))
if maxout: #apply maxout to the 'activated' hidden layer output
maxout_out = None
maxoutsize = n_out
for i in xrange(maxoutsize):
t = self.output[:,i::maxoutsize]
if maxout_out is None:
maxout_out = t
else:
maxout_out = T.maximum(maxout_out, t)
self.output = maxout_out
# parameters of the model
self.params = [self.W, self.b]