Questions tagged [train-test-split]

Questions with this tag are about how to split the machine learning data set into random train and test subsets.

In particular questions with this tag can be aimed at understanding better how to split the data with the scikit-learn functionality. In scikit-learn a random split into training and test sets can be quickly computed with the train_test_split helper function.

Reference: https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html

428 questions

votes

1 answer

Scikit learn Stratified Shuffle Split does not work when one of the classes has just one instance

I am trying to split my dataset into a train and a test set using scikit learn's stratified shuffle split, but it does not work because one of the classes has just one instances. It would be okay if that one instance goes into either of train or…

asked Aug 03 '21 at 15:48

Nilesh Kumar

votes

1 answer

Why I am getting the error for GroupShuffleSplit (train test split)

I have 2 datasets and applying 5 different ML models. Dataset 1: def dataset_1(): ... ... bike_data_hours = bike_data_hours[:500] X = bike_data_hours.iloc[:, :-1].values y = bike_data_hours.iloc[:, -1].values X_train, X_test,…

python python-3.x machine-learning scikit-learn train-test-split

asked Jul 09 '21 at 18:49

Opps_0

votes

1 answer

Should I perform train_test_split first and then GridSearchCV and then K Fold Crossvalidation?

I am having a lot of confusion between GridSearchCV and K fold Cross Validation. I know that GridSearch is only for hyperparameter optimization and K Fold will split my data into K folds and iterate over them (cv value). So should I first split my…

gridsearchcv train-test-split k-fold

asked Jun 27 '21 at 11:44

spectre

votes

1 answer

Train test split mysql records into views

how do i create two views, one for training data and the other for test data 70:30 split in mySql. CREATE VIEW training_data AS SELECT Posts.post_content as post_content, CASE WHEN (Posts.post_title like '%covid%corona%covid19%' or…

mysql train-test-split

asked Jun 11 '21 at 08:54

Isa french

votes

1 answer

"ValueError: Found input variables with inconsistent numbers of samples: [40, 10]" Problem with splitting the data

I am using a sample data from a Udemy course for the sake of training. There are 51 rows in the data and I am trying to print the score of the model. The error I get is: ValueError: Found input variables with inconsistent numbers of samples: [40,…

python machine-learning scikit-learn train-test-split

asked May 19 '21 at 17:30

cagatay.e.sahin

votes

0 answers

Getting same feature transformation via PCA for test set fails

In an ML project you first separate out your train and test data set and you carry out all your transformation on the train data set to to make sure information leakage doesn't take place. To be more precise: X_train, X_test, y_train, y_test =…

python-3.x machine-learning pca train-test-split

asked May 13 '21 at 01:14

add-semi-colons

18,094
55
145
232

votes

1 answer

train_test_split exception with 2D labels as stratify array

I'm trying to use the train_test_split function by providing the labels array that is a 2-d array for stratifying, with only 0 or 1 values (i.e. [0,0], [0,1], [1,0] or [1,1] are the four possible labels). I cannot rename labels (e.g. to 1,2,3,4 for…

python scikit-learn multilabel-classification train-test-split

asked May 03 '21 at 11:46

ChessMateK

votes

1 answer

Why random_state differs in test_train_split of Scikit Learn

I've been writing some code for credit card fraud detection problem using Scikit learn. I used train_test_split to split my data into training, test and valaidation data…

python machine-learning scikit-learn train-test-split

asked Apr 20 '21 at 21:40

Muhammad Bilal

votes

1 answer

Random Forest Train Test Split Accuracy

I am working through a random forest model for the first time and have come across an issue with my accuracy quantification. Currently, I split the dataset (30% as test size), fit the model, then predict y values based on my model, and score the…

python model random-forest decision-tree train-test-split

asked Apr 15 '21 at 21:09

MaxDawg27

votes

1 answer

Using Catboost Classifier to convert categorical columns

I'm trying to apply CatBoost to one of my columns for categorical features but get following error: CatBoostError: Invalid type for cat_feature[non-default value idx=0,feature_idx=2]=68892500.0 : cat_features must be integer or string, real number…

python machine-learning train-test-split catboost

asked Apr 13 '21 at 08:02

AJ.

votes

1 answer

ValueError: Found input variables with inconsistent numbers of samples: [1319, 245]

I am facing issues related to train_test_split: final = [] final.append(dataset) final.append(dataset1) X = dataset[:,0:2] y = dataset1[:,2] X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.15,…

python scikit-learn train-test-split

asked Mar 23 '21 at 20:31

Syed Ali Abbas

votes

3 answers

how can I train test split in scikit learn

does anyone know what is the problem? x=np.linspace(-3,3,100) rng=np.random.RandomState(42) y=np.sin(4*x)+x+rng.uniform(size=len(x)) X=x[:,np.newaxis] from sklearn.model_selection import train_test_split X_train, X_test, y_train,…

python scikit-learn train-test-split

asked Mar 23 '21 at 15:31

Reza

votes

1 answer

Create random train-test split of defined proportion while maintaining exclusivity of one attribute in each set

I have multiple sets of different lengths and I wish to randomly sort these sets into two supersets such that: Any one set only appears in one superset and, The sum of the lengths of all sets in a superset is as close as possible to a defined…

python scikit-learn dataset data-science train-test-split

asked Mar 16 '21 at 19:36

Matthew Newall

votes

1 answer

Is there a way to solve this error concerning StratifiedShuffleSplit?

am a newbie in ML and l have been trying out the udacity ML project.However, l got an error that l am having a hard time solving. The code seems okay but l can't seem to iterate through the data. I know that its to do with the new…

python scikit-learn train-test-split

asked Mar 07 '21 at 08:37

Isaac Wobomba

votes

0 answers

How to apply Word2Vec on SVM

I am not sure how to fit my SVM model with Word2vec training data set ?what should I put instead of question mark in below code? model = gensim.models.Word2Vec(sentences= df['meaningful_words']) Train_X, Test_X, Train_Y, Test_Y =…

svm word2vec train-test-split

asked Feb 23 '21 at 16:16

Pegah

Prev 1 2 3

…

28 29 Next