1

I have a data set with 150 rows, 45 features and 40 outputs. I can well overfit the data but I cannot obtain acceptable results for my cross validation set.

With 25 hidden layers and quite large number of iterations, I was able to get ~94% accuracy on my training set; put a smile on my face. But cross validation result turned out to be less than 15%.

So to mitigate overfitting I started playing with the regularization parameter (lambda) and also the number of hidden layers. The best result (CV) I could get was 24% on training set and 34% on the training set with lambda=1, 70 hidden layers and 14000 iterations. Increasing the number of iters also made it worse; I can't understand why I cannot improve CV results with increased lambda and iters?

Here is the lambda-hiddenLayer-iter combinations I have tried:

https://docs.google.com/spreadsheets/d/11ObRTg05lZENpjUj4Ei3CbHOh5mVzF7h9PKHq6Yn6T4/edit?usp=sharing

Any suggested way(s) of trying smarter regulationParameter-hiddenLayer-iters combinations? Or other ways of improving my NN? I using my matlab code from Andrew Ng's ML class (uses backpropagation algorithm.)

Яois
  • 3,838
  • 4
  • 28
  • 50
cmelan
  • 247
  • 5
  • 13
  • 1
    To improve your question you might want to add: 1. Type of NN you use (kohonen, backpropagation, that new SDR one, ...) 2. If you programmed it yourself or if you use a library or tool or whatever. – BitTickler Apr 17 '15 at 17:11
  • It uses backpropagation algorithm; I added that to my question. Thanks! – cmelan Apr 17 '15 at 18:00
  • 1
    Consider moving this question to [Stats](http://stats.stackexchange.com/help/on-topic) as it doesn't have anything to do with programming, at least directly. – Яois Apr 21 '15 at 18:52

3 Answers3

5

It's very hard to learn anything from just 150 training examples with 45 features (and if I read your question right, 40 possible output classes). You need far more labeled training examples if you want to learn a reasonable classifier - probably tens or hundreds of thousands if you do have 40 possible classes. Even for binary classification or regression, you likely need thousands of examples of you have 45 meaningful features.

  • This is also what I concluded after some more attempts. At least for ANN, training set size is vitally important. – cmelan May 05 '15 at 07:15
4

Some suggestions:

  • overfitting occurs primarily when the structure of the neural network is too complex for the problem in hand. If the structure of the NN isn't too complex, increasing the number of iterations shouldn't decrease accuracy of prediction

  • 70 hidden layers is quite a lot, you may try to dramatically decrease the number of hidden layers (to 3-15) and increase the number of iterations. It seems from your file that 15 hidden layers are fine in comparison to 70 hidden layers

  • while reducing the number of hidden layers you may vary the number of neurons in hidden layers (increase/decrease) and check how the results are changing

Konstantin
  • 2,937
  • 10
  • 41
  • 58
2

I agree with Logan. What you see in your dataset makes perfect sense. If you simply train a NN classifier with 45 features for 40 classes you will get great accuracy because you have more features than output-classes. So the model can basically "assign" each feature to one of the output classe, but the resulting model will be highly over-fitted and probably not represent whatever you are modeling. Your significantly lower cross-validation results seem to be right.

You should rethink your approach: Why do you have 40 classes? Maybe you can change your problem into a regressive problem instead of a classification problem? Also try to look into some other algorithms like Random Forrest for example. Or decrease the number of features significantly.

Jonidas
  • 187
  • 4
  • 21