Can Training dataset and testing data set be seperate instead of split

Question

Can we have a separate data set for both Training and Testing. I'm working on a project to pick up effective test case AS part of this, i analyse the bug database and come up with triggers which have yielded bug and arrive at a model . So this bug database forms my training set. The test cases that I have written is my test data and i have to supply this test data to the model to say if the test case is effective or not. so in this case, instead of splitting the dataset into training and test data, i have to have two different data sets (test data from bug database) and training data (test cases generated manually) is this something do-able using Machine Learning? Kindly let me know.

I’m voting to close this question because it is not about programming as defined in the [help] but about ML theory and/or methodology - please see the intro and NOTE in the `machine-learning` [tag info](https://stackoverflow.com/tags/machine-learning/info). — desertnaut, May 17 '21 at 16:51

score 2 · Accepted Answer · answered May 17 '21 at 16:47

2

Yes, the training dataset and testing dataset can be separate files. In real world cases, the testing data is generally some separate unseen dataset.

The main principle to follow is that when training the model, a dataset must be kept separate (hold out set) for testing. This data can be provided separately in different files, databases or even generate using splits. This is done to avoid data leaks (when testing data is somehow used for training the model).

answered May 17 '21 at 16:47

skillsmuggler

1,862
1
11
16

Thanks a lot for the info. So from the bug analysis , we should be able to classify the data (multi-class) classification and arrive at a model to which the Test case (Training set) can be fed to get the effective test cases. – Sai May 17 '21 at 17:08
Yes, train on your training dataset and use the trained model to make prediction on the testing data. – skillsmuggler May 18 '21 at 10:48

Can Training dataset and testing data set be seperate instead of split

1 Answers1