Supervised Learning validation Set - ANN

Question

I have implemented a Neural Network in Processing using Supervised Learning method. What I'm actually doing is training some circles to move on their target position.

My code works perfectly however, I found that many people split their code into 3 sets, in order to find the validation error in relation to training error. (performance of network).

 1.Training Set 
 2.Validation Set
 3.Test Set

My program includes a set of training inputs and training outputs. My network is trained (loop) until it has achieved a specific target.

For example I have 5 inputs and 5 target outputs.

Input values  {0.2, 0.1, 0.15, 0.11, 0,01}
Target values {1,1,1,1,1}

Set Learning rate = 1
Set momentum = 1
Set bias     = 1

A) Create 5 random weights between 0-1 and 5 bias Weight from 0-1.

random_Weight = random (0-1)
Bias_Weight = random (0-1)

B) Calculate sum

Sum = (input_data  * random_Weight) + (Bias * Bias_Weight)

c) Input data into exponential Function

Output  =  1.0 /(1.0 +exp(-1 * sum)

d) Calculate Error:

Error = Target values - output

e) Adjust new weights

Change = (learning rate * weight * input values) + (momentum * change)
weight = weight + change

f) Repeat until Target position = current position

I don’t know how many times my train function will repeat until the expected target is achieved. How can I divide the data into the 3 sets? (train, validation, test).

If I understood correctly I have implemented the training set only.

My question is [a] how to create validation set in this specific problem?

[b] What is the data set and [c] how can I split it for the validation set in this case?

[d] How to calculate validation error?

[e] Is there any suggested documentation in this procedure where I could start?

user3666197 · Accepted Answer · 2015-01-04T18:45:44.537

Ad "performance of network"

Let's first demystify the terms. Network has no "performance" per se. Any predictor, derived from a functional minimiser ( classical ANNs are such case ) has some "mathematically formulated" error of how well is it modelling the Reality, however the main issue is not how "close" the model simulates the observed ( and ad-hoc emulated ) Reality ( having been trained by the example-result pairs [observations] used in training ) BUT how "well" the "trained"-model will handle the not yet observed ( thus not pre-trained ) examples.

Thus the ANN-model ability to "generalise", not to suffer from "bias" and "overfitting" are the main qualitative signs of the predictor's "performance" ( quality of predictions ).

Ad [a]+[b]+[c]

For the sake of the SUPERVISED learner, your chances are quite high. Prepare, develop or otherwise acquire a reasonable amount of examples ( observations ) from your problem domain.

As [e] will teach you, there is a certain rationale for the problem domain, what a reasonable amount is ( and in brief: it depends on the intrinsic behaviour of the Reality, your ANN is trying to simulate, thus it is not known a-priori, so no need to panic prematurely ).

Take responsible SUPERVISED learning / data-sanity efforts to make sure, your ANN-learner will not train against wrong/noisy [ example, correct_result ]-pairs in any row of your available DataSET. This is more important than one may expect on first sight. Yes, it is human to err, however the Hell is not forgiving and getting any AI/ML-learner pre-trained to repeat and repeat and repeat and repeat your errors is anything but a waste of a chance to use the powers of AI.

Having a DataSET of say 2000 observations ( [x_1.1, x_2.1, x_3.1,...,x_n.1],[y_1] ) available, classics recommend to split the DataSET (row-wise) into three sub-sets:

aTrainingDataSET: having some 60%-70% of the records ( observation pairs )
aTestingDataSET: having some 20%-30% records
aCrossValidationDataSET: having some 20%-30% records ( not seen in the training phase )

This gnostically-fair method allows you to assume, how well the trained-ANN "generalised" it's ability to handle not only the examples it was trained for, but also to handle the not-in-data examples ( where you know a-priori the correct result, otherwise you were not able to fulfill your SUPERVISED learning role ), which detects your AI-learner's ability to survive the real world "evolution(s)" not contained in your first few examples ( from aTrainingDataSET ).

So, feed your ANN-learner with the training part of the DataSET to find the ( ideally convex ) minimiser-driven optimum/threshold output ( ANN-settings ).

Test the pre-trained ANN-learners with different [C,gamma] non-DataSET-related settings against aTestingDataSET to find "how precise" are the ANN-model predictions over [C,gamma] landscape of the ANN-model.

Finally, evaluate the "best"-tuned [C,gamma]-adjusted learners ( so far the candidates, that best reflect both aTrainingDataSET & aTestDataSET ) how well these will predict answers on aCrossValidationDataSET, which neither of 'em was exposed to smell so far [ this part of your SUPERVISED ( pre-rated ) experience was not shown to 'em just for this very important possibility ].

This way you may manage your ANN-model to escape from the traps of both "high-bias" / "overfitting", the two principal black-holes a trivial (mathematically) formulated "penalty"-based minimiser will have no other chance but to run your ANN-model into.

Ad [d]

In the sense of the above methodology, a CrossValidation prediction errors ( a penalty function ) is calculated from any pre-trained, [C,gamma]-adjusted ANN-learner once being exposed to aCrossValidationDataSET.

Ad [e]

There are many marvelous hands-on courses on ANN/ML, so do not hesitate to join and visit.

If you ever would have a chance to re-run any lecture / course from prof. Andrew NG, be it from his Stanford, Coursera or his recent AI/ML-Lab phase of his immense efforts in popularisation of this practical use of the ANN-learners, there would go my choice No.1 to follow.

UPDATE

Ad an Add-On question on sizing of aTrainingDataSET

Supposing the statement about having -5- training examples ( in the question/comment below ) available for a training process of the ANN is correct, there seems to be a problem in case, there are some principal restrictions to have more potent DataSET ( larger base of SUPERVISED learning examples ).

Typical ANN "consumes" a lot of training samples so as to allow it's minimiser-based algorith evolve and stabilise it's best performing internal settings ( the operating principle of the ANN-s ), thus maximising it's ability to answer ( predict ) questions as close to the correct answer as possible.

While there might be some value in just -5- [example-answer] pairs, it is too few to train an ANN-based learner so as to serve it's future task ( predictor-role ).

Try to acquire / generate / prepare a more robust DataSET for this initial process of developing a solid ANN-based predictor.

Thank you for your answer. It’s the first time I’ve done something with ANN I’m not sure If I understand correctly. My program includes a set of training inputs and training outputs. My network is trained (loop) until it has achieved a specific target. For example I have 5 inputs (x1,y1) and 5 target outputs (tx1,ty1). I don’t know how many times my train function will repeat until the expected target. How can I divide the data into the 3 sets? (train, validation, test). — Apollon1954, Jan 04 '15 at 18:32
Be patient, George, you will have it soon. Enjoy the subject in the meantime -- what ANN-architecture do you employ { inputNODEs : hiddenLayer1NODEs : hiddenLayer(2+)NODEs : outputNODEs } w/wo biasNODEs and how does a sample of your SUPERVISED learning examples/answers look like? — user3666197, Jan 04 '15 at 18:50
My program is very simple and I’m not using Hidden layers. I’m using a learning rate to control the training process. (Lower value avoid overfitting but increases the number of training loops). Moreover I’m using the “momentum” variable to smooth the training process by adding a portion of the previous backpropagation into the current backpropagation — Apollon1954, Jan 04 '15 at 19:01
Is there any specific formula to calculate validation error? — Apollon1954, Jan 04 '15 at 22:01

Supervised Learning validation Set - ANN

1 Answers1

Ad "performance of network"

Ad [a]+[b]+[c]

Ad [d]

Ad [e]