Disable certain output nodes in PyBrain

Question

I'm creating a simple feed-forward neural network in PyBrain to classify characters (26 lower case, 26 upper case and 10 numbers)

There are two different documents - one has only upper case letters and numbers and the second has lower case letters, numbers as well as upper case letters.

Do I have to create two different networks ? Is there any way to disable the upper case nodes when the first document is being processed ? If more document (images of documents) are integrated to the project later, there will be other combinations too. Creating new networks for them all seems tedious.

Thanks in advance

PS: Does anyone know any really (really) good tutorials on pyBrain ? I'm a beginner and the documentation only addresses really simple examples.

But how can it disable the uppercase nodes when only lowercase inputs are given? My dataset is pitifully small (around 1000characters) and the prediction is almost always wrong this way. (i.e. one network) I currently have 2 networks and the prediction is better. — user2118322, Mar 23 '13 at 13:58

score 0 · Answer 1 · answered Mar 22 '13 at 20:09

You do not need two networks, one is a good enough. You will need 62 labels (26 upper, 26 lower and 10 numbers) and a larger data set to train these labels. You can probably build the data sets by using the two documents.

There is a very good tutorial on handwriting recognition offered by Prog Ng in his coursera class (Lectures 3 and 4 at http://www.ml-class.org). I think the class is starting soon and you will find it quite useful in classifying handwriting digits

score 0 · Answer 2 · answered Mar 23 '13 at 18:36

It is no wonder that each of the separate networks yields better performance on the according training set it has been trained on. But these prediction error values are misleading, because it is an ill-posed problem to minimize the error on a training set. Your ultimate goal is to maximize the generalization performance of your model, so it performs well on new data it has not seen during training. Imagine a network which just memorizes each of the characters and thus functions more like a hashtable. Such a network would yield 0 errors on the training data but would perform badly on other data.

One way to measure generalization performance is to extract a fraction (e.g. 10%) of your available data and to use it as a test set. You do not use this test set during training, only for measurement.

Further, you should check the topology of your network. How many hidden layers and how many neurons per hidden layer do you use? Make sure your topology is large enough so it can tackle the complexity of your problem.

Also have a look at other techniques to improve generalization performance of your network, like L1 regularization (subtracting a small fixed amount of the absolute value of your weights after each training step), L2 regularization (subtracting a small percentage of your weights after each training step) or Dropout (randomly turning off hidden units during training and halving the weight vector as soon as training is finished). Further, you should consider more efficient training algorithms like RPROP- or RMSProp rather than plain backpropagation (see Geoffrey Hinton's coursera course on neural networks). You should also consider the MNIST dataset containing written numbers 0-9 for testing your setup (you should easily achieve less than 300 misclassificaitons on the test set).

To answer your original question on how to omit certain output neurons, you could create an own layer module. Have a look at the SoftmaxLayer, but before applying the softmax activation function, set all output-neurons to 0 which belong to the classes you want to omit. You need to manipulate the outbuf varable in _forwardImplementation. If you want to use this during training, make sure to set the error signal to zero for those classes before backpropagating the error to the previous layer (by manipulating _backwardImplementation). This can be useful e.g. if you have incomplete data and do not want to throw away each sample containing just one NaN value. But in your case you actually do not need this.

Disable certain output nodes in PyBrain

2 Answers2