Questions tagged [train-test-split]

Questions with this tag are about how to split the machine learning data set into random train and test subsets.

In particular questions with this tag can be aimed at understanding better how to split the data with the scikit-learn functionality. In scikit-learn a random split into training and test sets can be quickly computed with the train_test_split helper function.

Reference: https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html

428 questions

votes

1 answer

ValueError: y should be a 1d array, got an array of shape (74216, 2) instead

I am trying to apply Logistic Regression Models with text. I Vectorized my data by TFIDF: vectorizer = TfidfVectorizer(max_features=1500) x = vectorizer.fit_transform(df['text_column']) vectorizer_df = pd.DataFrame(x.toarray(),…

asked Jun 28 '22 at 14:04

NivB

votes

0 answers

How to fix: 'ValueError: Found input variables with inconsistent numbers of samples'

For predicting house prices using linear regression, I am not able to train the model using model.fit() as it gives me an error. Here is my code: #importing dependencies import pandas as pd import numpy as np from sklearn.linear_model import…

scikit-learn linear-regression train-test-split

asked Jun 25 '22 at 04:17

sarahcodebyte

votes

2 answers

Scaling row-wise with MinMaxScaler from Sklearn

By default, scalers from Sklearn work column-wise. But i need my data to be scaled line-wise, so i did the following: from sklearn.preprocessing import MinMaxScaler from sklearn.model_selection import train_test_split import numpy as np # %%…

scikit-learn normalization train-test-split

asked Jun 06 '22 at 12:34

Murilo

votes

1 answer

ValueError: too many values to unpack(expected 2) - train_test_split

I'm doing test_split before the feature extraction. however, when I try to loop through any set, whether train or test, I get the following error (ValueError: too many values to unpack(expected 2)) for cls in os.listdir(path): for sound…

python train-test-split

asked Mar 15 '22 at 10:47

Ran

votes

1 answer

sklearn train_test_split on pandas

I'm a relatively new user to sklearn and have question about using train_test_split from sklearn.model_selection. I have a large dataframe that has shape of (96350, 156). In my dataframe is column named CountryName that contains 160 countries, each…

python pandas scikit-learn train-test-split

asked Mar 06 '22 at 13:28

leskovecg

votes

0 answers

equivalent of sklearn's StratifiedGroupKFold for PySpark?

I have a dataframe for single-label binary classification with some class imbalance and I want to make a train-test split. Some observations are members of groups in the data that should only appear in either the test split or train split but not…

pyspark scikit-learn cross-validation training-data train-test-split

asked Oct 23 '21 at 07:15

michen00

votes

1 answer

Stratified Cross Validation or Sampling for train-test split based on multiple features in python

sklearn's train_test_split , StratifiedShuffleSplit and StratifiedKFold all stratify based on class labels (y-variable or target_column). What if we want to sample based on features columns (x-variables) and not on target column. If it was just one…

pandas machine-learning scikit-learn cross-validation train-test-split

asked Jun 27 '21 at 18:56

Abhi25t

3,703
3
19
32

votes

1 answer

How do I best make %80 train, %10 validation, and %10 percent test splits using train_test_split in Python?

How do I best make %80 train, %10 validation, and %10 percent test splits using train_test_split in Python? Is there a common way to visualize this split once created? from sklearn.model_selection import train_test_split # Splitting the data by a…

python train-test-split

asked Jun 20 '21 at 21:44

iceAtNight7

votes

2 answers

How to split duplicate samples to train test with no overlapping?

I have a nlp datasets (about 300K samples) where there exits duplicate data. I want to split it to train test （70%-30%）, and they should have no overlapping. For instance: |dataset: | train | test | | a | a | …

pandas machine-learning scikit-learn pytorch train-test-split

asked Apr 09 '21 at 07:12

Whisht

votes

1 answer

Differnce between train_test_split and StratifiedShuffleSplit

I came across the following statement when trying to find the differnce between train_test_split and StratifiedShuffleSplit. When stratify is not None train_test_split uses StratifiedShuffleSplit internally, I was just wondering why the…

machine-learning scikit-learn train-test-split

asked Mar 23 '21 at 05:47

adiaux

votes

5 answers

YoloV4 Custom Dataset Train Test Split

I try to train a Yolo Net with my custom Dataset. I have some Images (*.jpg) and the labels/annotations in the yolo format as a txt-file. Now I want to split the data in a train and validation set. As a result I want a train and a validation folder…

python scikit-learn yolo train-test-split dataset

asked Mar 11 '21 at 09:03

Basti

votes

1 answer

train_test_split for multiple targets

I have multiobjective problem. I have two targets ylo and yhi sharing the same features x: x = np.array([[0,1,2],[2,3,4]]) ylo = np.array([10,11]) yhi = np.array([12,13]) is there a way to split the data to get x_train,…

python scikit-learn train-test-split

asked Feb 16 '21 at 17:21

Hud

votes

2 answers

Train test split for ensuring all categories are included in train set

Let's say there are some 20 categorical columns in the data, each having a different set of unique categorical values. Now a train test split has to done, and one needs to ensure that all unique categories are included in the train set. How can it…

python categorical-data train-test-split

asked Dec 06 '20 at 05:54

Aroonima

votes

2 answers

How to solve sklearn error: "Found input variables with inconsistent numbers of samples"?

I have a challenge using the sklearn 70-30 division. I receive an error on line: X_train, X_test, y_train, y_test = train_test_split(X_smote, y_smote, test_size=0.3, stratify=y) The error is: Found input variables with inconsistent numbers of…

python data-analysis sklearn-pandas train-test-split

asked Oct 19 '20 at 22:34

Paip

votes

2 answers

Split Train Test Data sets keeping like values together

I have a data set of animal types with ID's and I want to break said data set into Test/Train data sets. I also want to keep all ID's for a respective animal within either the Train or Test data set. An example of the data is below with a random…

python data-science train-test-split

asked Oct 01 '20 at 21:19

AlmostThere

Prev 1 2 3

…

28 29 Next