Questions tagged [train-test-split]

Questions with this tag are about how to split the machine learning data set into random train and test subsets.

In particular questions with this tag can be aimed at understanding better how to split the data with the scikit-learn functionality. In scikit-learn a random split into training and test sets can be quickly computed with the train_test_split helper function.

Reference: https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html

428 questions

votes

2 answers

Do you have to clean your test data before feeding into an NLP model?

This is a natural language processing related question. Suppose I have a labelled train and unlabelled test set. After I have cleaned my train data(stopword, stem, punctuations etc), I use this cleaned data to build my model. When fitting it on my…

asked Feb 21 '21 at 10:29

graphboy

votes

1 answer

Split dataset containing multiple labels

I have a dataset with multiple labels, ie for each X I have 2 y and I need to split into train and test set. I tried with the sklearn function train_test_split(): import numpy as np from sklearn.model_selection import train_test_split X =…

python numpy scikit-learn train-test-split

asked Feb 05 '21 at 02:05

tobor

votes

1 answer

Split values in a column in "is a date" or "NaT"

I would like to find values in a column (clear_date) that do not correspond to a valid date. The date is formatted as '%Y/%m/%d'. I've tried the following piece of code but, the resulting variable doesn't have any rows! x_test =…

python pandas dataframe machine-learning train-test-split

asked Jan 24 '21 at 14:25

Ayush Vishwakarma

votes

1 answer

Why results are inaccurate when I am using different dataset for testing a model in Machine Learning?

I am trying to do forecasting based on time series. I am doing temperature forecasting by using the past three years of hourly data. Instead of using X_test from train_test_split method, I am using my own test dataset because I need seven-day ahead…

python machine-learning forecasting train-test-split

asked Jan 17 '21 at 14:27

Chirag Jain

votes

1 answer

Stratified train/test-split with guaranteed inclusion of small classes on strongly imbalanced datasets

I am working with large-scale, imbalanced datasets where I need to pick a stratified training set. However, even if the dataset is strongly imbalanced, I still need to ensure that at least every label class is included at least once in the training…

python scikit-learn train-test-split

asked Jan 08 '21 at 21:10

Andreas

votes

0 answers

Split dataset to train and test for a LDA model

I have a dataset that contains about 17000 of user data scraped from twitter and I am working with the latent dirichlet allocation algorithm. I want to split my dataset but I am not sure what is the best way. What are the criteria to split a dataset…

dataset cross-validation training-data lda train-test-split

asked Jan 07 '21 at 09:09

hajar hajar

votes

0 answers

Patsy Dmatrices X, y split

Using patsy.dmatrices to split my data into y,x and I am losing observations. Ex: formula = 'target ~ v1 + v2 + v3' y, x = patsy.dmatrices(formula, df, return_type = 'dataframe') My df.shape is ~ 54,000,000 length, however following x/y split, my…

python train-test-split patsy

asked Dec 30 '20 at 21:05

Joshua Paiva

votes

1 answer

kernel gets stuck if I train/test split by 55% and 45%

I am trying to train a neural net on a dataset. Everything works. There is no issue with the code if I specifiy 70% or 50% percent of the data as training and the rest as testing. But as I specify 55% and 45% for training and testing, the kernel…

r neural-network train-test-split

asked Dec 18 '20 at 07:22

Saad Zaheer

votes

1 answer

How to install sensplit on google colab?

How to install sensplit on google colab ? I already cloned the git repository on google colab but I couldn't use the sensplit package , when I run the !pip install sensplit it returns errors. Please, I need a hint. Thanks in advance

python time-series google-colaboratory sensors train-test-split

asked Dec 14 '20 at 11:04

Sofia

votes

1 answer

Why the line not cut across the data?

I using linear regression model to predict my data. Orig Data When I using sns plot; I able to see the line cut's thru to all the data point. Using snsborn.lmplot But when I using train_test_split function: The coeff & interc as below : Weight = …

python graph linear-regression train-test-split

asked Dec 05 '20 at 06:36

Tep66

votes

1 answer

Split train/test on based on comparison operators

I'm trying to figure out how to split the data based on these conditions in order to run a CNN on this: Split the training/testing dataset into two sets: one with class labels < 5 and one with class labels >= 5. Print out the shapes of the resulting…

python tensorflow conv-neural-network train-test-split

asked Nov 28 '20 at 23:12

runner16

votes

1 answer

I keep on getting the error name 'y_test' is not defined

I really need your help! I've written this code: from sklearn.model_selection import train_test_split from sklearn import metrics from sklearn.metrics import accuracy_score def train_test_rmse(x,y): X = df_new[feature_cols] y =…

python scikit-learn linear-regression predict train-test-split

asked Nov 27 '20 at 19:06

eopiyo

votes

1 answer

Can we tune any of the parameters on testing data, including any parameters learned by preprocessing?

I want to normalize the data using StandardScaler function. But I have doubts about how this should be done. One way to do this is like as follows: scaler = StandardScaler().fit(X) X = scaler.transform(X) X_train, X_test, y_train,…

machine-learning normalization hyperparameters train-test-split

asked Nov 20 '20 at 17:53

ma_shamshiri

votes

1 answer

Python - Predicting test data that is smaller than train data

I have preprocessed some data ready to train a Multinomial Naive Bayes classification. The train data is 80% of my data and the test data is 20%. The train data is an array of size 8452 and the test data is an array of size of 4231 If I want to see…

python classification naivebayes train-test-split

asked Nov 17 '20 at 16:05

apol96

votes

1 answer

split dataset into train and test using tensorflow

I want to split my full dataset(every raw data has multiple features) into train and test sets. Rather than using scikit-learn 's train-test-split is there any other proper way to split my data? as well as I need to shuffle my data when…

tensorflow machine-learning train-test-split

asked Nov 15 '20 at 18:20

Dale Steyn

Prev 1 2 3

…

28 29 Next