Questions tagged [train-test-split]

Questions with this tag are about how to split the machine learning data set into random train and test subsets.

Questions with this tag are about how to split the machine learning data set into random train and test subsets.

In particular questions with this tag can be aimed at understanding better how to split the data with the scikit-learn functionality. In scikit-learn a random split into training and test sets can be quickly computed with the train_test_split helper function.

Reference: https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html

428 questions
0
votes
2 answers

Big difference in score (10%) between a split_test_train and a cross validation

I'm on a classification issue with: 2,500 lines. 25000 columns 88 different classes unevenly distributed And then something very strange happened: When I run a dozen different split test trains, I always get scores around 60%... And when I run cross…
0
votes
2 answers

sklearn train_test_split confusion

I am getting an error running a code. What could be the possible error? X = [['Item_Identifier', 'Item_Weight', 'Item_Fat_Content', 'Item_Visibility', 'Item_Type', 'Item_MRP', 'Outlet_Identifier', 'Outlet_Establishment_Year',…
0
votes
2 answers

Facing an IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

I have been working on link prediction problem in which the data set, which is a numpy array, has to be parsed and stored into another numpy array. I am trying to do the same but at 9th line it is throwing an IndexError: only integers, slices (:),…
0
votes
1 answer

Numpy split array without copying

I have a very large array of images (multiple GBs) and want to split it using numpy. This is my code: images = ... # this is the very large array which contains a lot of images. images.shape => (50000, 256, 256) indices = ... # array containing…
Codey
  • 1,131
  • 2
  • 15
  • 34
0
votes
1 answer

How to split dataset into train validate test sets correctly, in simple clear way?

I have a dataset with 100 samples, I want to split it into 75%, 25%, 25% for both Train Validate, and Test respectively, then I want to do that again with different ratios such as 80%, 10%, 10%. For this purpose, I was using the code down, but I…
Bilal
  • 3,191
  • 4
  • 21
  • 49
0
votes
2 answers

TPOT: TPOTRegressor is showing Name Error

Here is the code I am running about TPOTRegressor. from tpot import TPOTRegressor from sklearn.datasets import load_boston from sklearn.model_selection import train_test_split housing = sklearn.datasets.load_boston() X_train, X_test, y_train,…
0
votes
1 answer

How can I split the data into test and train without using function train_test_split?

i am currently working on timeseries forecasting, and I have to split the data into a training set and a test set. (Have the first 70% of the data be in the training set) However, i cannot use the train_test_split function because it will shuffle…
0
votes
1 answer

Python 1D CNN model - Error in train_test_split

I'm trying to build a 1D CNN model by processing ECG signals to diagnose sleep apnea. I am using the sklearn library and encountered an error in train_test_split. Here is my code: # loading the file with open("ApneaData.csv") as csvDataFile: …
0
votes
0 answers

Why is predict in R taking Train data instead of Test data?

I have trying to build a svm model using a linear kernel. I have divided my 100,000 records into train and test. 70000 in train and 30000 in test. model<-…
Kanishk Jain
  • 103
  • 2
  • 10
0
votes
1 answer

How can I use the test_proportion data in a machine learning model?

I have a data with 4000 CNN features and it is a binary classification problem. All I know about the test data is the proportions of 1 and 0. How can I tell to my model to predict test labels by using the proportions data ? (Like is there a way to…
0
votes
1 answer

Thoughts about train_test_split for machine learning

I just noticed that many people tend to use train_test_split even before handling the missing data, and seem like they split the data at the very beginning and there are also a bunch of people, they tend to slipt the data right before model building…
YOU WANG
  • 9
  • 2
0
votes
1 answer

if y_test data is the predicted results, how can I see the actual results?

Im trying to make a confusion matrix to determine how well my model performed. I split my model into x and y testing and training set however, to make my confusion matrix, I need the y_test data(the predicted data) and the actual data. Is there a…
0
votes
2 answers

How to use stratify for single column

I am very new in this data staff. That's why, I might not be sure what should I write as my question. I am trying to express my issue as simple as possible. I am showing part of my codes. print(data) Output: array([[0, 0, 0, ..., 255, 255, 255], …
0
votes
1 answer

How to solve Nameerror: name 'n' is not defined in train_test_split of scikit-learn 0.22 version without downgrading the version?

I am doing sentiment analysis and using scikit learn train_test_split function. But I am getting Nameerror: 'n' is not defined even though I have defined it. After checking various forums I found out that this error is pertaining in the new versions…
0
votes
1 answer

How do I standardize only int64 columns after train-test split?

I have a dataframe ready for modelling, it contains continuous variables and one-hot-encoded variables ID Limit Bill_Sep Bill_Aug Payment_Sep Payment_Aug Gender_M Gender_F Edu_Uni DEFAULT_PAYMT 1 10000 2000 350 1000 …
wjie08
  • 433
  • 2
  • 11