Questions tagged [train-test-split]

Questions with this tag are about how to split the machine learning data set into random train and test subsets.

In particular questions with this tag can be aimed at understanding better how to split the data with the scikit-learn functionality. In scikit-learn a random split into training and test sets can be quickly computed with the train_test_split helper function.

Reference: https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html

428 questions

-1

votes

1 answer

ValueError: Found input variables with inconsistent numbers of samples: [951, 2025]

There are tons of samples from this error in which the problem is related with dimensions of the array or how a dataframe is read. However, I'm using just a python list for both X and Y. I'm trying to split my code in train and test with…

asked Dec 27 '22 at 05:21

Gorga Siagian

-1

votes

1 answer

How to use multiple datasets from different measurements for one prediction with machine learning models? How to split data into Train Test sets?

I'm working on Capacity prediction models for Lithium-Ion batteries. I have 10 datasets from 10 different batteries including the capacity and multiple features. Each dataset is time dependent. In the end I want to predict the capacity for a…

python machine-learning multiple-databases train-test-split

asked Oct 19 '22 at 09:37

steffi44

-1

votes

1 answer

Train data and test data that have target column

I'm trying to make some predictive model using Baking Dataset - Marketing Targets from kaggle here is the link : https://www.kaggle.com/datasets/prakharrathi25/banking-dataset-marketing-targets The dataset from kaggle already been separated into…

machine-learning supervised-learning train-test-split

asked Sep 20 '22 at 01:53

Jovian Aditya

-1

votes

1 answer

train test split by specific class count

I have data that includes X features and Y - binary class ( 0 or 1 ) My problem is imbalanced so I want to make sure my y_test after the split will contain about 50% of the samples classified as 1 after the split. I tried to use train_test_split…

python machine-learning scikit-learn classification train-test-split

asked Jun 13 '22 at 17:58

brian rik

-1

votes

1 answer

Is there a way to remove some rows in the training set based on values in another column

I have a dataframe and I split it into training and testing (80:20). It looks like this: V1 V2 V3 V4 V5 Target 5 2 34 12 9 1 1 8 24 14 12 0 12 27 4 12 9 0 Then I build a simple regression model and made predictions. The…

python pandas machine-learning scikit-learn train-test-split

asked May 26 '22 at 14:12

MohammedE

-1

votes

1 answer

Test train data split - Machine learning

I am trying to do some test train split (90% and 10%) and used below query X_train, X_test, y_train, y_test = train_test_split(pdf.drop(columns = list(set(cols_not_used).union(set(['RANK'])))) , pdf['RANK'], random_state = 13, train_size = 0.9) But…

python pandas machine-learning training-data train-test-split

asked Feb 07 '22 at 09:38

Mounika G

-1

votes

1 answer

Why am I getting the following error: y should be a 1d array, got an array of shape (423, 2) instead. (in Python)

I am trying to calculate the AUC-ROC curve for a logistic regression function using the following code: from sklearn.linear_model import LogisticRegression from sklearn.metrics import classification_report, confusion_matrix from…

python train-test-split

asked Jun 02 '21 at 15:42

kcardenas

-1

votes

1 answer

Using seperated test and train files with train_test_split()

I have two .csv files that one of them is test.csv and the other one is train.csv. However, as you can predict the test file does not have the target column ('y' in this case) while train file has. What I wanted to do is first using train file to…

python pandas machine-learning training-data train-test-split

asked Jan 24 '21 at 12:06

GLHF

3,835
10
38
83

-1

votes

1 answer

ValueError: Found input variables with inconsistent numbers of samples: [5, 11623]

X=latest_df[['open', 'high', 'low', 'volume', 'market']] y=latest_df['close'] y = np.where(df['close'].shift(-1) > df['close'], 1, -1) X = pd.DataFrame(X) y = pd.DataFrame(y) a = X.shape b = y.shape import random random.seed(1234) from…

python pandas scikit-learn reshape train-test-split

asked Jan 24 '21 at 07:03

Aakanksha3010

-1

votes

2 answers

How to make train and test splitting without target value as separate dataframe?

I can apply scikit-learn function train_test_split only for two dataframes with training data and target data. But how to split my dataframe including target value into training dataframe and testing dataframe in proportion of 0.75? I don't want to…

python python-3.x dataframe scikit-learn train-test-split

asked Dec 21 '20 at 23:20

french_fries

1,149
6
22

-1

votes

1 answer

ValueError: Found input variables with inconsistent numbers of samples: [218, 30]

I am doing some facial recognition training using linear SVC, where my dataset is 870x22. I have 30 frames for 29 different person, where i am using 22 simple value pixels in the image to recognize the face image, said 22 pixels are my features.…

python machine-learning scikit-learn svm train-test-split

asked Aug 23 '20 at 00:29

user11597888

-1

votes

1 answer

How to use train_test_split? Fix error n_samples = 0

I'm trying to split the data I am working with into training and testing sets but I get the error that n_samples = 0 when I use the train_test_split function. Here's my code: X_train, X_test, y_train, y_test =…

python machine-learning scikit-learn train-test-split

asked Aug 07 '20 at 21:14

John

-1

votes

1 answer

What's a good R-squared score?

I ran this Linear Regression code and I got the R-squared score using the .score() method. However, the score is not easily understandable as the score can go into negative numbers. The code can be run on your local file system if sklearn is…

python scikit-learn regression linear-regression train-test-split

asked Jul 13 '20 at 16:34

Sriswaroop Koundinya

-1

votes

1 answer

sklearn train test split by year

I have a dataset that goes from 2016 to 2020 with a 'Year' column. I would like to use 2016-2017 as train data and 2018-2020 as test data. Is there any easy method to perform this data split?

python scikit-learn train-test-split

asked May 12 '20 at 08:09

Matthias Gallagher

-1

votes

1 answer

How to split a tuple using train_test_split?

X = (569,30) y = (569,) X_train, X_test, y_train, y_test = train_test_split(np.asarray(X),np.asarray(y),test_size = 0.25, random_state=0) I am expecting output as below: X_train has shape (426, 30) X_test has shape (143, 30) y_train has shape…

python scikit-learn train-test-split

asked Apr 27 '20 at 10:07

Emon

Prev 1 2 3

…

28 29 Next