Whats the difference between a test set and a train set?

Question

I'm voting to close this question as off-topic because it's not about programming — , Apr 30 '18 at 00:53

Chetan_Vasudevan · Answer 1 · 2017-12-21T16:23:34.043

You may have to understand this concept by knowing three different concepts and they are

a. training set

b. validation set

c. test set

Any data-set you have and when you want to apply any algorithms to it you need to split the data-set into the above three.

a. training set usually you give around 60% of your original data-set.This contains a set of data that has pre-classified target and predictor variables.That is to fit the parameters.

b. validation set usually around 20% is required to validate the learning so far from the model. In statistics it is known as cross validation.Results here are compared to the unused pre-classified data.The validation data-set provides an unbiased evaluation of a model fit on the training data-set.

c. test set usually around 20% here we apply our chosen prediction algorithm on our test set in order to see how it's going to perform so we can have an idea about our algorithm's performance.It is not good to use the same data for training as well as testing, since it would not let us know how well the network generalizes and whether or not over-fitting has happened. Hence we need to keep separate pairs.

Splits can also be 60-20-20 or even 70-15-15

score 2 · Accepted Answer · answered Dec 20 '17 at 20:04

The difference is easy.

in general you can divide your train set with 70 % of data and your test set with 30 %. (80/20 is also possible)

The train set is your data set with which you train your model (classification, regression). After you set up some generalized rules you apply those on your test set and check how many of your test data were predicted correctly. I hope this was helpful!

Whats the difference between a test set and a train set?

2 Answers2