I'm new to theano. I've learnt the basics and trying to implement simple models (Logistic Regression etc.). The model is a very simple with 784 (28*28) input units and a 10 unit softmax non-linearity on top of it (Training on MNIST dataset). I'm using binary_crossentropy as the loss function and using an L2 Regularizer from preventing overfitting. But it seems like the model is still overfitting (by looking at the weights of the model; given below). I tried changing the regularization parameter (lambda) but nothing is working. Where did I go wrong ? Thanks in advance.
# theano stuff
from theano import shared, function, pp
import theano.tensor as T
import numpy as np
import matplotlib.pyplot as plt
n_feat = 28*28
m_sample = 60000
n_class = 10
W_shape = (n_class, n_feat)
B_shape = (1, n_class)
W_param = np.random.random(W_shape)
B_param = np.random.random(B_shape)
W = shared(W_param, name='W', borrow=True)
B = shared(B_param, name='B', borrow=True, broadcastable=(True, False))
X = T.dmatrix('X') # has to be of (mxn)
O = T.nnet.softmax(X.dot(W.transpose())+B)
prediction = T.argmax(O, axis=1)
L = T.dmatrix('L')
lam = 0.05 # regularization parameter lambda
# loss_meansqr = (((O-L)**2).mean()).mean()
# loss_meansqr_reg = (((O-L)**2).mean()).mean() + lam *((W**2).mean()+(B**2).mean())
# loss_binxent = T.nnet.binary_crossentropy(O,L).mean()
loss_binxent_reg = T.nnet.binary_crossentropy(O,L).mean() + lam*((W**2).mean()+(B**2).mean()) # i'm using this one
loss = loss_binxent_reg
gW = T.grad(loss, W)
gB = T.grad(loss, B)
lr = T.dscalar('lr')
upds = [(W, W-lr*gW), (B, B-lr*gB)]
print 'Compiling functions...'
train = function([X,L,lr], [loss], updates=upds)
predict = function([X],prediction)
print 'Functions compiled'
The weights look like this The weights of the model