Questions tagged [train-test-split]

Questions with this tag are about how to split the machine learning data set into random train and test subsets.

In particular questions with this tag can be aimed at understanding better how to split the data with the scikit-learn functionality. In scikit-learn a random split into training and test sets can be quickly computed with the train_test_split helper function.

Reference: https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html

428 questions

votes

2 answers

Tensorflow auto split image

suppose I have directories like this. full_dataset |---horse <= 40 images of horse |---donkey <= 50 images of donkey |---cow <= 80 images of cow |---zebra <= <= 30 images of zebra Then I write this with tensorflow image_generator =…

tensorflow image-processing train-test-split

asked Mar 23 '20 at 11:31

Ichsan

votes

1 answer

train test data split using stratify on two columns in scikit-learn

I have a dataset that I want to split into train and test so that I have data in the test set from each data source (specified in column "source") and from each class (specified in column "class"). I read about using the parameter stratifiy with…

scikit-learn train-test-split

asked Mar 10 '20 at 07:35

A_Matar

2,210
3
31
53

votes

0 answers

Splitting the datatset for classification

I am trying to train and test classification model, however, I don't understand why I am getting this error: ValueError: The test_size = 9 should be greater or equal to the number of classes = 11 What does this error mean? My code for splitting the…

python classification train-test-split

asked Feb 28 '20 at 11:08

Momo

votes

2 answers

Is there a way to do a stratified train/test split without shuffling the data?

I'm using time sensitive data and would like to maintain the order of the data but stratifying the data since I've got multiple labels. I haven't found any libraries that allow this.

machine-learning scikit-learn split data-science train-test-split

asked Feb 25 '20 at 17:15

Juanro Alvarado

votes

0 answers

Is it possible to shuffle a dataframe while using while grouping by index in pandas or sklearn?

I have dataframe df, containing patient data, as shown below: | patient_id | x | y | path | target | |------------ |----- |----- |------ |-------- | | 4423 | 234 | 53 | .... | 1 | | 4423 | 259 |…

python pandas machine-learning scikit-learn train-test-split

asked Feb 24 '20 at 12:53

A Merii

votes

3 answers

Is it possible to train data on 4 features and test on only using features?

I have done training on four features including Month, day, Hour and Temperature which is predicting some value , what i wan to do is to predict value on basis of month ,hour and day of next day only because i don't know the temp of next day(which…

machine-learning regression training-data train-test-split

asked Feb 19 '20 at 04:37

Amna Rizvi

votes

1 answer

K-folds do we still need to implement train_test_split?

I've been reading quite a bit and i'm a little confused with k-folds. I understand the concept behind it, but i'm not sure about how to deploy it. The usual step that i've been seeing after data exploration is train_test_split, encoding and scaling…

machine-learning scikit-learn train-test-split k-fold

asked Jan 20 '20 at 15:02

Jonathan

votes

1 answer

Why do we include the target class in both the arrays in train_test_split?

X_train, test_df, y_train, y_test = train_test_split(result, y_true, stratify = y_true, test_size = 0.2) In the above sample use of train_test_split, result is the data frame and y_true is a numpy array formed from the target class column from the…

machine-learning scikit-learn train-test-split

asked Dec 29 '19 at 19:29

user12518608

votes

2 answers

ImportError: cannot import name 'LatentDirichletAllocation'

I'm trying to import the following: from sklearn.model_selection import train_test_split and got following error, here's the stack trace : ImportError Traceback (most recent call last) in…

python python-3.x scikit-learn sklearn-pandas train-test-split

asked Dec 13 '19 at 10:30

Sanket Patel

votes

1 answer

How do I predict future results with scikitlearn, pandas in Python using RandomForestRegressor method?

Hello I came across this tutorial on how to use python with some libraries to predict future NCAAB games using a sportsreference library. I will post the code as well as the article. This seems to work well, but I think it is only testing based on…

python pandas scikit-learn train-test-split

asked Dec 12 '19 at 22:33

Ryan Record

votes

2 answers

sklearn train_test_split returns some elements in both test/train

I have a data-set X with 260 unique observations. when running x_train,x_test,_,_=test_train_split(X,y,test_size=0.2) I would assume that [p for p in x_test if p in x_train] would be empty, but it is not. Actually it turns out that only two…

scikit-learn train-test-split

asked Dec 01 '19 at 18:24

CutePoison

4,679
5
28
63

votes

1 answer

How to split data by using train_test_split in Python Numpy into train, test and validation data set? The split should not random

I want to split data category wise into train, test and validation set. For example: if we have 3 categories positive, negative and neutral in the dataset. The positive category split into train, test, and validation. And the same with the other two…

python numpy train-test-split

asked Nov 21 '19 at 08:13

user85181

votes

1 answer

Found input variables with inconsistent numbers of samples: [24, 25]

I need assistance reshaping my input to match my output. I believe my issue is with my target variable. I am getting the error as stated in the title. I have tried .reshape and .flatten(). Please help, and thanks in advance NEnews_train = [] for…

python nlp data-science train-test-split

asked Nov 18 '19 at 16:32

Deja Bond

votes

1 answer

How to print the classified points based on SVM classifier

I was using "svm" classifier to classify it was a bike or car. So, my features were 0,1,2 columns and dependents was 3rd column.I can able to clearly see the classification,but i don't know how to print all the points based on classification in…

python machine-learning scikit-learn preprocessor train-test-split

asked Nov 17 '19 at 12:29

vimal

votes

1 answer

Error while fitting train and test sets, train_test_split method

I am trying to evaluate my model with train_test_split. I have defined the following functions to create the output array on the table (top column) according to the input in function: def top_sh(num): ###Get the top(num) in Shanghai data and…

python machine-learning scikit-learn valueerror train-test-split

asked Nov 13 '19 at 12:44

programmerA

Prev 1 2 3

…

28 29 Next