Questions tagged [train-test-split]

Questions with this tag are about how to split the machine learning data set into random train and test subsets.

In particular questions with this tag can be aimed at understanding better how to split the data with the scikit-learn functionality. In scikit-learn a random split into training and test sets can be quickly computed with the train_test_split helper function.

Reference: https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html

428 questions

votes

2 answers

Big difference in score (10%) between a split_test_train and a cross validation

I'm on a classification issue with: 2,500 lines. 25000 columns 88 different classes unevenly distributed And then something very strange happened: When I run a dozen different split test trains, I always get scores around 60%... And when I run cross…

asked Jul 10 '20 at 10:44

Arnaud Hureaux

votes

2 answers

sklearn train_test_split confusion

I am getting an error running a code. What could be the possible error? X = [['Item_Identifier', 'Item_Weight', 'Item_Fat_Content', 'Item_Visibility', 'Item_Type', 'Item_MRP', 'Outlet_Identifier', 'Outlet_Establishment_Year',…

machine-learning scikit-learn train-test-split

asked Jun 26 '20 at 11:22

Darshika Verma

votes

2 answers

Facing an IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

I have been working on link prediction problem in which the data set, which is a numpy array, has to be parsed and stored into another numpy array. I am trying to do the same but at 9th line it is throwing an IndexError: only integers, slices (:),…

numpy machine-learning train-test-split

asked Jun 23 '20 at 18:25

datta subrahmanyam

votes

1 answer

Numpy split array without copying

I have a very large array of images (multiple GBs) and want to split it using numpy. This is my code: images = ... # this is the very large array which contains a lot of images. images.shape => (50000, 256, 256) indices = ... # array containing…

python numpy training-data train-test-split

asked Jun 10 '20 at 14:13

Codey

1,131
2
15
34

votes

1 answer

How to split dataset into train validate test sets correctly, in simple clear way?

I have a dataset with 100 samples, I want to split it into 75%, 25%, 25% for both Train Validate, and Test respectively, then I want to do that again with different ratios such as 80%, 10%, 10%. For this purpose, I was using the code down, but I…

python-3.x scikit-learn dataset train-test-split

asked May 25 '20 at 05:30

Bilal

3,191
4
21
49

votes

2 answers

TPOT: TPOTRegressor is showing Name Error

Here is the code I am running about TPOTRegressor. from tpot import TPOTRegressor from sklearn.datasets import load_boston from sklearn.model_selection import train_test_split housing = sklearn.datasets.load_boston() X_train, X_test, y_train,…

scikit-learn python-3.7 nameerror train-test-split tpot

asked May 14 '20 at 17:08

sdatta

votes

1 answer

How can I split the data into test and train without using function train_test_split?

i am currently working on timeseries forecasting, and I have to split the data into a training set and a test set. (Have the first 70% of the data be in the training set) However, i cannot use the train_test_split function because it will shuffle…

python scikit-learn train-test-split

asked May 14 '20 at 14:08

Cookie Monster

votes

1 answer

Python 1D CNN model - Error in train_test_split

I'm trying to build a 1D CNN model by processing ECG signals to diagnose sleep apnea. I am using the sklearn library and encountered an error in train_test_split. Here is my code: # loading the file with open("ApneaData.csv") as csvDataFile: …

python machine-learning scikit-learn train-test-split

asked May 08 '20 at 11:25

Sakshi Kumar

votes

0 answers

Why is predict in R taking Train data instead of Test data?

I have trying to build a svm model using a linear kernel. I have divided my 100,000 records into train and test. 70000 in train and 30000 in test. model<-…

r svm prediction train-test-split

asked May 04 '20 at 20:10

Kanishk Jain

votes

1 answer

How can I use the test_proportion data in a machine learning model?

I have a data with 4000 CNN features and it is a binary classification problem. All I know about the test data is the proportions of 1 and 0. How can I tell to my model to predict test labels by using the proportions data ? (Like is there a way to…

python machine-learning classification random-forest train-test-split

asked May 02 '20 at 11:16

Ege

votes

1 answer

Thoughts about train_test_split for machine learning

I just noticed that many people tend to use train_test_split even before handling the missing data, and seem like they split the data at the very beginning and there are also a bunch of people, they tend to slipt the data right before model building…

machine-learning train-test-split

asked Apr 16 '20 at 05:30

YOU WANG

votes

1 answer

if y_test data is the predicted results, how can I see the actual results?

Im trying to make a confusion matrix to determine how well my model performed. I split my model into x and y testing and training set however, to make my confusion matrix, I need the y_test data(the predicted data) and the actual data. Is there a…

python training-data confusion-matrix train-test-split

asked Apr 13 '20 at 19:45

ralph_cifarello

votes

2 answers

How to use stratify for single column

I am very new in this data staff. That's why, I might not be sure what should I write as my question. I am trying to express my issue as simple as possible. I am showing part of my codes. print(data) Output: array([[0, 0, 0, ..., 255, 255, 255], …

python machine-learning scikit-learn conv-neural-network train-test-split

asked Apr 04 '20 at 12:59

user1896653

3,247
14
49
93

votes

1 answer

How to solve Nameerror: name 'n' is not defined in train_test_split of scikit-learn 0.22 version without downgrading the version?

I am doing sentiment analysis and using scikit learn train_test_split function. But I am getting Nameerror: 'n' is not defined even though I have defined it. After checking various forums I found out that this error is pertaining in the new versions…

python scikit-learn python-3.7 nameerror train-test-split

asked Mar 31 '20 at 06:48

Piyush Ghasiya

votes

1 answer

How do I standardize only int64 columns after train-test split?

I have a dataframe ready for modelling, it contains continuous variables and one-hot-encoded variables ID Limit Bill_Sep Bill_Aug Payment_Sep Payment_Aug Gender_M Gender_F Edu_Uni DEFAULT_PAYMT 1 10000 2000 350 1000 …

python pandas feature-extraction train-test-split

asked Mar 28 '20 at 09:17

wjie08

Prev 1 2 3

…

28 29 Next