0

I've been working with the MLPClassifier for a while and I think I had a wrong interpretation of what the function is doing for the whole time and I think I got it right now, but I am not sure about that. So I will summarize my understanding and it would be great if you could add your thoughts on the right understanding.

So with the MLPClassifier we are building a neural network based on a training dataset. Setting early_stopping = True it is possible to use a validation dataset within the training process in order to check whether the network is working on a new set as well. If early_stopping = False, no validation within he process is done. After one has finished building, we can use the fitted model in order to predict on a third dataset if we wish to. What I was thiking before is, that doing the whole training process a validation dataset is being taken aside anways with validating after every epoch.

I'm not sure if my question is understandable, but it would be great if you could help me to clear my thoughts.

AnnaLee
  • 3
  • 5
  • 2
    I reformulated the question. I think what you are getting at is how the `early_stopping` parameter is precisely making use of a validation set. If the parameter is set, then the algorithm 90/10 splits the training data set into a training and validation data set at the start of the whole process (so not every training step). After each step the intermediate model is evaluated on the validation set and if the improvement is lower than `tol` the process stops earlier. The metric used is always 'accuracy', which is very bad (as-in limited) for a possibly multi-class classificator. – uberwach Oct 03 '20 at 12:27

1 Answers1

0

The sklearn.neural_network.MLPClassifier uses (a variant of) Stochastic Gradient Descent (SGD) by default. Your question could be framed more generally as how SGD is used to optimize the parameter values in a supervised learning context. There is nothing specific to Multi-layer Perceptrons (MLP) here.

So with the MLPClassifier we are building a neural network based on a training dataset. Setting early_stopping = True it is possible to use a validation dataset within the training process

Correct, although it should be noted that this validation set is taken away from the original training set.

in order to check whether the network is working on a new set as well.

Not quite. The point of early stopping is to track the validation score during training and stop training as soon as the validation score stops improving significantly.

If early_stopping = False, no validation within [t]he process is done. After one has finished building, we can use the fitted model in order to predict on a third dataset if we wish to.

Correct.

What I was thiking before is, that doing the whole training process a validation dataset is being taken aside anways with validating after every epoch.

As you probably know by now, this is not so. The division of the learning process into epochs is somewhat arbitrary and has nothing to do with validation.

Arne
  • 9,990
  • 2
  • 18
  • 28
  • Thanks a lot for your answer! I read a lot about neural networks as well and it often is explained that a validation set within the process is used in order to validate the model and avoid overtraining. Is this only possible with other functions then or is there an option in the MLPClassifier as well? – AnnaLee Oct 03 '20 at 15:31
  • @AnnaLee The early stopping option is indeed used to avoid overfitting, but if you are looking for validation in the sense of tuning hyperparameters, then you will need to add another layer of complexity, e.g. by cross-validation. Of course there are scikit-learn classes for this as well, and you can combine them with the MLPClassifier. – Arne Oct 03 '20 at 22:37
  • Ok great, thank you very much! My current plan is combining different layer sizes in a first step (using training and validation loss) and then tunning all other hyperparameters using gridsearch with cross validation for the most sucessfull layer size(s) from the first step. What do you think? – AnnaLee Oct 04 '20 at 09:08
  • @AnnaLee Gridsearch with cross validation is a good choice if it doesn't take too long. The first part of your plan sounds odd to me though. Wouldn't the model with the largest layer size always be the best according to your criterion? – Arne Oct 05 '20 at 08:20
  • No, I don't think so, because larger networks tend to overfit the training data and are not performing well on the validation set. I would always check that the loss of the validation set is not noticeable higher (worse) than the trainings loss – AnnaLee Oct 05 '20 at 13:34
  • @AnnaLee Okay, in that case I guess it makes sense, although there is of course no guarantee that the layer size determined in this way is still optimal if you change the other hyperparameters. – Arne Oct 05 '20 at 15:44