Questions tagged [train-test-split]

Questions with this tag are about how to split the machine learning data set into random train and test subsets.

In particular questions with this tag can be aimed at understanding better how to split the data with the scikit-learn functionality. In scikit-learn a random split into training and test sets can be quickly computed with the train_test_split helper function.

Reference: https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html

428 questions

votes

2 answers

How to split dataset to train, test and valid in Python?

I have a dataset like this my_data= [['Manchester', '23', '80', 'CM', 'Manchester', '22', '79', 'RM', 'Manchester', '19', '76', 'LB'], ['Benfica', '26', '77', 'CF', 'Benfica', '22', '74', 'CDM', 'Benfica', '17', '70', 'RB'], ['Dortmund',…

python scikit-learn train-test-split

asked Sep 22 '20 at 06:27

dede.brahma

votes

3 answers

processing before or after train test split

I am using this excellent article to learn Machine learning. https://stackabuse.com/python-for-nlp-multi-label-text-classification-with-keras/ The author has tokenized the X and y data after splitting it up. X_train, X_test, y_train, y_test =…

keras scikit-learn nlp tokenize train-test-split

asked Aug 28 '19 at 13:15

shantanuo

31,689
78
245
403

votes

1 answer

dimension mismatch error in CountVectorizer MultinomialNB

Before I lodge this question, I have to say I've thoroughly read more than 15 similar topics on this board, each with somehow different recommendations, but all of them just could not get me right. Ok, so I split my 'spam email' text data…

python naivebayes countvectorizer train-test-split

asked Aug 21 '17 at 19:14

Chris T.

1,699
7
23
45

votes

3 answers

Randomly distribute files into train/test given a ratio

I am at the moment trying make a setup script, capable of setting up a workspace up for me, such that I don't need to do it manually. I started doing this in bash, but quickly realized that would not work that well. My next idea was to do it…

python bash text-files file-handling train-test-split

asked Aug 29 '16 at 16:17

Mønster

votes

3 answers

Splitting datasets into train and test in julia

I am trying to split the dataset into train and test subsets in Julia. So far, I have tried using MLDataUtils.jl package for this operation, however, the results are not up to the expectations. Below are my findings and issues: Code # the inputs…

julia train-test-split

asked Feb 05 '21 at 07:18

Mohammad Saad

1,935
10
28

votes

1 answer

What to make of a flat validation accuracy curve in a learning curve graph

While plotting a learning curve to see how well the model building was going, I realized that the validation accuracy curve was a straight line from the get-go. I thought maybe it was just due to some error in splitting the data into training and…

python machine-learning classification cross-validation train-test-split

asked Nov 17 '20 at 16:55

user7864386

votes

1 answer

stratify argument in train_test_split vs StratifiedShuffleSplit

What is the difference between using the stratify argument in train_test_split function of sklearn, and the StratifiedShuffleSplit function? Don't they do the same thing?

scikit-learn train-test-split

asked Apr 19 '20 at 05:08

Rohan Pinto

votes

3 answers

How to split datatable dataframe into train and test dataset in python

I am using datatable dataframe. How can I split the dataframe into train and test dataset? Similarly to pandas dataframe, I tried to use train_test_split(dt_df,classes) from sklearn.model_selection, but it doesn't work and I get error. import…

python pandas dataframe train-test-split

asked Jul 21 '20 at 19:48

ibra

1,164
1
11
26

votes

1 answer

Undersampling for imbalance data after train test split

I am working on a project with imbalanced data. I want to balance the data using random undersampling. I am confused if i should do the undersampling after test train split or should i do undersampling 1st and then do train test split? My approach…

machine-learning resampling train-test-split

asked May 22 '20 at 16:18

sarika

votes

1 answer

Use only N Images using ImageDataGenerator from each class

There are 10 directories(labels) each with 800 images. I'm trying to use transfer learning to train my model. The data is loaded using ImageDataGenerator as shown below: train_datagen = ImageDataGenerator(rescale=1./255, shear_range=0.2, …

python keras training-data train-test-split

asked Feb 28 '20 at 09:36

Jedi Nerd

votes

4 answers

train_test_split( ) method of scikit learn

I am trying to create a machine learning model using DecisionTreeClassifier. To train & test my data I imported train_test_split method from scikit learn. But I can not understand one of its arguments called random_state. What is the…

python python-3.x machine-learning scikit-learn train-test-split

asked Sep 02 '19 at 09:19

Nafees

votes

1 answer

How does Machine Learning algorithm retain learning from previous execution?

I am reading Hands on Machine Learning book and author talks about random seed during train and test split, and at one point of time, the author says over the period Machine will see your whole dataset. Author is using following function for…

machine-learning train-test-split

asked May 29 '19 at 17:36

Sachin Rastogi

votes

2 answers

Problems with diagnostics of prophet forecast

I am working with an dataset of crimes in chicago and specially working on a future prediction of the crime rate in chicago (from 2012 till 2016 I have data). I generated a forecast using the prophet package of facebook. It worked very well and all…

dataset forecast train-test-split facebook-prophet

asked Feb 21 '19 at 14:07

Scrappy

votes

1 answer

Getting Validation set from Train set by using percentage from groupby() in pandas

Have a train dataset with multi-class target variable category train.groupby('category').size() 0 2220 1 4060 2 760 3 1480 4 220 5 440 6 23120 7 1960 8 64840 I would like to get the new validation dataset from…

python pandas group-by cross-validation train-test-split

asked Nov 26 '18 at 23:13

Keithx

2,994
15
42
71

votes

2 answers

ML.NET TrainTestSplit random seed

I am using TrainTestSplit in ML.NET, to repeatedly split my data set into a training and test set. In e.g. sklearn, the corresponding function takes a seed as an input, so that it is possible to obtain different splits, but in ML.NET repeated calls…

c# train-test-split ml.net

asked Nov 15 '18 at 13:37

Petter T

3,387
2
19
31

Prev 1

…

28 29 Next