Questions tagged [train-test-split]

Questions with this tag are about how to split the machine learning data set into random train and test subsets.

Questions with this tag are about how to split the machine learning data set into random train and test subsets.

In particular questions with this tag can be aimed at understanding better how to split the data with the scikit-learn functionality. In scikit-learn a random split into training and test sets can be quickly computed with the train_test_split helper function.

Reference: https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html

428 questions
-1
votes
1 answer

ValueError: Found input variables with inconsistent numbers of samples: [951, 2025]

There are tons of samples from this error in which the problem is related with dimensions of the array or how a dataframe is read. However, I'm using just a python list for both X and Y. I'm trying to split my code in train and test with…
-1
votes
1 answer

How to use multiple datasets from different measurements for one prediction with machine learning models? How to split data into Train Test sets?

I'm working on Capacity prediction models for Lithium-Ion batteries. I have 10 datasets from 10 different batteries including the capacity and multiple features. Each dataset is time dependent. In the end I want to predict the capacity for a…
-1
votes
1 answer

Train data and test data that have target column

I'm trying to make some predictive model using Baking Dataset - Marketing Targets from kaggle here is the link : https://www.kaggle.com/datasets/prakharrathi25/banking-dataset-marketing-targets The dataset from kaggle already been separated into…
-1
votes
1 answer

train test split by specific class count

I have data that includes X features and Y - binary class ( 0 or 1 ) My problem is imbalanced so I want to make sure my y_test after the split will contain about 50% of the samples classified as 1 after the split. I tried to use train_test_split…
-1
votes
1 answer

Is there a way to remove some rows in the training set based on values in another column

I have a dataframe and I split it into training and testing (80:20). It looks like this: V1 V2 V3 V4 V5 Target 5 2 34 12 9 1 1 8 24 14 12 0 12 27 4 12 9 0 Then I build a simple regression model and made predictions. The…
-1
votes
1 answer

Test train data split - Machine learning

I am trying to do some test train split (90% and 10%) and used below query X_train, X_test, y_train, y_test = train_test_split(pdf.drop(columns = list(set(cols_not_used).union(set(['RANK'])))) , pdf['RANK'], random_state = 13, train_size = 0.9) But…
-1
votes
1 answer

Why am I getting the following error: y should be a 1d array, got an array of shape (423, 2) instead. (in Python)

I am trying to calculate the AUC-ROC curve for a logistic regression function using the following code: from sklearn.linear_model import LogisticRegression from sklearn.metrics import classification_report, confusion_matrix from…
kcardenas
  • 1
  • 1
-1
votes
1 answer

Using seperated test and train files with train_test_split()

I have two .csv files that one of them is test.csv and the other one is train.csv. However, as you can predict the test file does not have the target column ('y' in this case) while train file has. What I wanted to do is first using train file to…
GLHF
  • 3,835
  • 10
  • 38
  • 83
-1
votes
1 answer

ValueError: Found input variables with inconsistent numbers of samples: [5, 11623]

X=latest_df[['open', 'high', 'low', 'volume', 'market']] y=latest_df['close'] y = np.where(df['close'].shift(-1) > df['close'], 1, -1) X = pd.DataFrame(X) y = pd.DataFrame(y) a = X.shape b = y.shape import random random.seed(1234) from…
-1
votes
2 answers

How to make train and test splitting without target value as separate dataframe?

I can apply scikit-learn function train_test_split only for two dataframes with training data and target data. But how to split my dataframe including target value into training dataframe and testing dataframe in proportion of 0.75? I don't want to…
-1
votes
1 answer

ValueError: Found input variables with inconsistent numbers of samples: [218, 30]

I am doing some facial recognition training using linear SVC, where my dataset is 870x22. I have 30 frames for 29 different person, where i am using 22 simple value pixels in the image to recognize the face image, said 22 pixels are my features.…
user11597888
-1
votes
1 answer

How to use train_test_split? Fix error n_samples = 0

I'm trying to split the data I am working with into training and testing sets but I get the error that n_samples = 0 when I use the train_test_split function. Here's my code: X_train, X_test, y_train, y_test =…
-1
votes
1 answer

What's a good R-squared score?

I ran this Linear Regression code and I got the R-squared score using the .score() method. However, the score is not easily understandable as the score can go into negative numbers. The code can be run on your local file system if sklearn is…
-1
votes
1 answer

sklearn train test split by year

I have a dataset that goes from 2016 to 2020 with a 'Year' column. I would like to use 2016-2017 as train data and 2018-2020 as test data. Is there any easy method to perform this data split?
Matthias Gallagher
  • 475
  • 1
  • 7
  • 20
-1
votes
1 answer

How to split a tuple using train_test_split?

X = (569,30) y = (569,) X_train, X_test, y_train, y_test = train_test_split(np.asarray(X),np.asarray(y),test_size = 0.25, random_state=0) I am expecting output as below: X_train has shape (426, 30) X_test has shape (143, 30) y_train has shape…
Emon
  • 63
  • 9