Questions tagged [train-test-split]

Questions with this tag are about how to split the machine learning data set into random train and test subsets.

In particular questions with this tag can be aimed at understanding better how to split the data with the scikit-learn functionality. In scikit-learn a random split into training and test sets can be quickly computed with the train_test_split helper function.

Reference: https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html

428 questions

votes

4 answers

Split data into training and testing not randomly

I want to split my dataset into two parts, 75% for training and 25% for testing. There are two classes. And I have another dataset that has only one instance of one class, rest all instances belong to second class. So I dont want to split randomly.…

asked Mar 29 '18 at 19:54

Ara

votes

4 answers

Split into training and testing set in R?

How can I write the following written code in python into R ? X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) Spliting into training and testing set…

python r machine-learning train-test-split

asked Nov 09 '17 at 19:57

Keshav Maheshwari

votes

2 answers

Why my model work ok with test data from train_test_split while doesn't with the new data?

I am new to machine learning. I have a continuous dataset. I am trying to model the target label using several features. I utilize the train_test_split function to separate the train and the test data. I am training and testing the model using the…

python machine-learning neural-network regression train-test-split

asked Oct 17 '17 at 12:53

Yahya

votes

0 answers

Duplicating pandas.get_dummies columns from train to test data

I have two dataframes, train and test. They both have the same exact column names which contain categorical string features. I'm trying to map these features to dummy variables in the training set, train a regression model, then do the same exact…

python pandas dummy-variable train-test-split feature-engineering

asked Aug 16 '17 at 01:01

Austin

6,921
12
73
138

vote

1 answer

How to combine X_train and y_train into one balanced dataframe in Pyhton?

I would highly appreciate your advise with this: I have imbalanced dataset: y has only 2% of 1. I want to balance only the train dataset and afterwards to perform on the balanced train dataset feature selection prior to the model. After performing…

train-test-split imbalanced-data

asked Jan 26 '23 at 11:48

Ella

vote

1 answer

"Found input variables with inconsistent numbers of samples" Have I done something wrong during the train_test_split?

I am trying to logistic Regression Model, and run some test but I keep getting this error. Not really sure what I have done differently to everyone else from sklearn import preprocessing X = df.iloc[:,:len(df.columns)-1] y =…

python machine-learning jupyter-notebook data-analysis train-test-split

asked Jan 11 '23 at 15:15

ace brown

vote

1 answer

Can someone help explain why my MLP keeps on getting a perfect classification report?

I am using Sklearn.train_test_split and sklearn.MLPClassifier for human activity recognition. Below is my dataset in a pandas df: a_x a_y a_z g_x g_y g_z activity 0 3.058150 5.524902 -7.415221 0.001280 -0.022299 -0.009420 sit 1 …

scikit-learn training-data train-test-split mlp

asked Nov 23 '22 at 20:24

JP1990

vote

0 answers

How is train test split in xgboost cv specified?

It is to be noted that the xgboost.cv method returns eval metrics on both train and test sets whereas the function itself takes no parameter stating which dataset to be used for training and which for testing. The xgboost.cv method takes only dtrain…

python xgboost cross-validation train-test-split

asked Oct 30 '22 at 07:03

wasif

vote

2 answers

Split rows in train test based on user id PySpark

I have a PySpark dataframe containing multiple rows for each user: userId action time 1 buy 8 AM 1 buy 9 AM 1 sell 2 PM 1 sell 3 PM 2 sell 10 AM 2 buy 11 AM 2 sell 2 PM 2 sell 3 PM My goal is to split this dataset into a…

python apache-spark pyspark train-test-split

asked Sep 07 '22 at 08:55

mht

vote

0 answers

k-fold implementation with train test split

I am trying to put kfold to my code as overfitting is an issue. Previously i have split my data into train test . But i am getting confused where and how to apply k-fold as my data is already split. x_norm = preprocessing.normalize(x,…

conv-neural-network train-test-split k-fold overfitting-underfitting

asked Jul 02 '22 at 06:49

luffy

vote

1 answer

Reshape your data either using array.reshape(-1, 1) during model.predict()?

I'm trying to run a number of classification models, but all of them keep throwing the reshape error. I think it has to do with the calculation of model.score or model.predict but i've tried running some reshape commands (on X_valid and Y_valid)…

python model reshape train-test-split

asked Apr 26 '22 at 07:59

Brian

vote

1 answer

how to use an explicit validation set with predefined split fold?

I have explicit train, test and validation sets as 2d arrays: X_train.shape (1400, 38785) X_val.shape (200, 38785) X_test.shape (400, 38785) I am tuning the alpha parameter and need advice about how I can use the predefined validation set in…

python-3.x validation scikit-learn cross-validation train-test-split

asked Apr 04 '22 at 16:29

Bluetail

1,093
2
13
27

vote

0 answers

Cannot fit a Model after Performing Stratified K-Fold Split

I am new to the concept of using K-folds to split into train and test data, which I am practicing with the dataset below. Context: The Dataset is the Kaggle UrbanSound8k set available at https://www.kaggle.com/datasets/chrisfilo/urbansound8k I am…

python numpy scikit-learn train-test-split k-fold

asked Mar 26 '22 at 01:57

ShrunkenDown

vote

1 answer

Data Cardinality keras odd number of images- train test split

My autoencoder shows a "Valueerror: Data cardinality is ambiguous: x sizes: 14 y sizes: 31 Make sure all arrays contain the same number of samples." split_size_i = int(images.shape[0]*0.7) split_size =…

tensorflow keras jupyter-notebook valueerror train-test-split

asked Feb 22 '22 at 22:52

Starcode1619

vote

0 answers

5-fold cross validation from sklearn with train, val, and test sets and ratio of 60/20/20

I am able to create train, validation, and test sets for one fold experiments using sklearn like below with train, val and test having a ratio of 60/20/20: x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.4,…

python machine-learning scikit-learn cross-validation train-test-split

asked Jan 07 '22 at 07:20

Mona Jalal

34,860
64
239
408

Prev 1 2 3

…

28 29 Next