1

I'm new to theano. I've learnt the basics and trying to implement simple models (Logistic Regression etc.). The model is a very simple with 784 (28*28) input units and a 10 unit softmax non-linearity on top of it (Training on MNIST dataset). I'm using binary_crossentropy as the loss function and using an L2 Regularizer from preventing overfitting. But it seems like the model is still overfitting (by looking at the weights of the model; given below). I tried changing the regularization parameter (lambda) but nothing is working. Where did I go wrong ? Thanks in advance.

# theano stuff
from theano import shared, function, pp
import theano.tensor as T
import numpy as np
import matplotlib.pyplot as plt
n_feat = 28*28
m_sample = 60000
n_class = 10
W_shape = (n_class, n_feat)
B_shape = (1, n_class)
W_param = np.random.random(W_shape)
B_param = np.random.random(B_shape)

W = shared(W_param, name='W', borrow=True)
B = shared(B_param, name='B', borrow=True, broadcastable=(True, False))
X = T.dmatrix('X') # has to be of (mxn)
O = T.nnet.softmax(X.dot(W.transpose())+B)
prediction = T.argmax(O, axis=1)
L = T.dmatrix('L')
lam = 0.05 # regularization parameter lambda

# loss_meansqr = (((O-L)**2).mean()).mean()
# loss_meansqr_reg = (((O-L)**2).mean()).mean() + lam *((W**2).mean()+(B**2).mean())
# loss_binxent = T.nnet.binary_crossentropy(O,L).mean()

loss_binxent_reg = T.nnet.binary_crossentropy(O,L).mean() + lam*((W**2).mean()+(B**2).mean()) # i'm using this one
loss = loss_binxent_reg
gW = T.grad(loss, W)
gB = T.grad(loss, B)
lr = T.dscalar('lr')
upds = [(W, W-lr*gW), (B, B-lr*gB)]
print 'Compiling functions...'
train = function([X,L,lr], [loss], updates=upds)
predict = function([X],prediction)
print 'Functions compiled'

The weights look like this The weights of the model

ayandas
  • 2,070
  • 1
  • 13
  • 26

1 Answers1

0

Not sure if this is caused the problem but shouldn't the loss function be a categorical cross-entropy, not a binary cross-entropy?

The MNIST task is to classify each image into one class; an image cannot belong to many classes. Binary cross-entropy is the appropriate loss when an item can belong to many classes and categorical cross-entropy is the appropriate loss when an item can belong to only one class.

I would also recommend trying this without any regularization initially (remove that component from the loss function altogether while testing), and make sure your learning rate is small enough (e.g. 0.001 should work).

Daniel Renshaw
  • 33,729
  • 8
  • 75
  • 94