Questions tagged [train-test-split]

Questions with this tag are about how to split the machine learning data set into random train and test subsets.

In particular questions with this tag can be aimed at understanding better how to split the data with the scikit-learn functionality. In scikit-learn a random split into training and test sets can be quickly computed with the train_test_split helper function.

Reference: https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html

428 questions

votes

0 answers

Invalid shape when i train_test_split

I am getting shape error when I train_test_split the data the code are follows y = data['Cover_type'] X = data.drop('Cover_Type',axis=1) when I train_test_split it doesn't give me error but when i fit the GradientBoostingRegressor and find the…

python train-test-split

asked Dec 12 '21 at 13:09

Sajjad Ali

votes

0 answers

"ValueError: array length 293 does not match index length 975" while applying random forest

I am trying to apply random forest and I am getting this error "ValueError: array length 293 does not match index length 975" . Please find the code snippet below. Can anyone please help what I am doing wrong? Code: from sklearn.model_selection…

python machine-learning data-science random-forest train-test-split

asked Dec 07 '21 at 17:28

S_Gupta

votes

1 answer

How can I split a dataframe using sklearn train test split such that there are equal proportions for each category?

I have a dataset with n independent variables and a categorical variable that I would like to perform a regression analysis on. The number of rows of data is different for each category. I would like to split the dataset into test and train data…

python scikit-learn train-test-split

asked Dec 06 '21 at 16:13

Hoppity81

votes

0 answers

My True/false statements in my dataframe change overtime in the code (tenserflow.kerax)

So im ussing NASA asteroids dataset with tenserflow.kerax for some university assignment The first thing i wanted was to standardize the data so i use (1) df = dfprime ss = StandardScaler() df_scaled = df #df.iloc[:,:-1] df_scaled =…

pandas dataframe scikit-learn tf.keras train-test-split

asked Nov 30 '21 at 22:38

Diego Pavez Verdi

votes

1 answer

Is result of train/test split the same on different machines with set random_state?

I want to reduce randomness when training models on different machines and I was wondering if setting param random_state in sklearn rain_test_split gives always the same results. Is it dependent on system or not? So when ruining this code on…

machine-learning random scikit-learn train-test-split

asked Nov 18 '21 at 12:47

Adrian Cypcar

votes

1 answer

Why does my cross-validation consistently perform better than train-test split?

I have the code below (using sklearn) that first uses the training set for cross-validation, and for a final check, uses the test set. However, the cross-validation consistently performs better, as shown below. Am I over-fitting on the training…

python machine-learning scikit-learn cross-validation train-test-split

asked Nov 14 '21 at 05:29

Roz

votes

1 answer

Error in using accuracy_score from sklearn in Logistic Regression

I am doing a Logistic Regression with the Elastic Net regularization method. I am trying to predict which variables are associated positively or negatively. An error is occurred after running the accuracy_score(y_true,y_pred), but i got an error:…

python logistic-regression train-test-split

asked Nov 11 '21 at 15:48

Víctor de la O Pascual

votes

1 answer

how to split train and test data from a .mat file in sklearn?

I have a mnist dataset as a .mat file, and want to split train and test data with sklearn. sklearn reads the .mat file as below: {'__header__': b'MATLAB 5.0 MAT-file, Platform: GLNXA64, Created on: Sat Oct 8 18:13:47 2016', '__version__': '1.0', …

scikit-learn mnist mat train-test-split

asked Nov 10 '21 at 17:32

BlueCurve

votes

2 answers

Splitting data in x_train and x_test gives error: Too many values to unpack expected 2

Whenever I try to split the data into x_train and x_test I get the following error: Too many values to unpack expected 2 My code: import glob import matplotlib.pyplot as plt import numpy as np import matplotlib.image as mpimg for img in…

python image train-test-split

asked Nov 01 '21 at 07:08

Kartikeya Kawadkar

votes

1 answer

Append data to training dataset after train test split

I have split my training and test datasets using the train test split library lengths = [int(len(supervised_data)*0.8),int(len(supervised_data)*0.2)+1] train_data, test_data = torch.utils.data.random_split(supervised_data, lengths) Now I am trying…

python scikit-learn pytorch train-test-split

asked Oct 21 '21 at 08:33

manlike

votes

1 answer

Is it fair enough to make model evaluation based on just "train_test_split"?

I'm absolutely confused about model evaluation, interpreting its results and using cross_val_score. I don't understand why evaluation on a test set is usually considered as a final and solid result, while if we just choose other split, we'll get a…

machine-learning cross-validation train-test-split

asked Oct 06 '21 at 10:29

UniqueName

votes

0 answers

Error when attempting to predict with estimator (matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n?,k),(k,m?)

I have a dataset with 329 features because of one hot encoding and I am trying to fit and predict them with a linear regression after splitting them up into training and test set. When I try to predict with my y_test I get this…

python machine-learning linear-regression train-test-split

asked Oct 04 '21 at 02:11

Randy Chng

votes

1 answer

train/validate/test split for time series anomaly detection

I'm trying to perform a multivariate time series anomaly detection. I have training data that consists of "normal" data. I train on this data and detect anomalies on the test set that contains normal + anomalous data. My understanding is that it…

machine-learning time-series anomaly-detection train-test-split

asked Oct 01 '21 at 21:42

siaabd001

votes

1 answer

Issue creating data for training and testing using 3 folders containing images

I am running: path = Path('/content/drive/MyDrive/X-Ray_Image_DataSet') np.random.seed(41) data = ImageDataBunch.from_folder(dta, train="Train", valid ="Valid", ds_tfms=get_transforms(),size=(256,256), bs=32, num_workers=4).normalize() And I am…

python image-classification train-test-split

asked Sep 20 '21 at 04:16

JAY SURANA_183115

votes

2 answers

Problem when splitting data: KeyError: "None of [Int64Index([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13], dtype='int64')] are in the [columns]"

I am attempting to execute a train test split on some data, wine.data but when initializing x and y: import numpy as np import matplotlib.pyplot as plt import pandas as pd from sklearn.neural_network import MLPClassifier from sklearn.model_selection…

python pandas numpy train-test-split

asked Sep 09 '21 at 09:15

i_literally_hate_programming

Prev 1 2 3

…

28 29 Next