Questions tagged [train-test-split]

Questions with this tag are about how to split the machine learning data set into random train and test subsets.

In particular questions with this tag can be aimed at understanding better how to split the data with the scikit-learn functionality. In scikit-learn a random split into training and test sets can be quickly computed with the train_test_split helper function.

Reference: https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html

428 questions

votes

1 answer

GridSearchCV - which data should be used?

I am currently working on a binary classification problem with about 2000 data points in the training set and I wonder if I should use the whole training set for gridsearch or if I should do a split first to generate validation data. I have the…

asked Nov 10 '20 at 10:30

4ndy94

votes

0 answers

Error zsh: killed when running python program that loads a txt file

So I need to split a txt file into a testing and a training file (also txt). I've run the code below for a smaller data set and it works perfectly. But it fails when I try to load the complete data set (3gb) and get a zsh:killed. Is there any way to…

python scikit-learn zsh train-test-split

asked Nov 02 '20 at 23:41

Luis Guillermo

votes

1 answer

Type error:Singleton array while trying to split the dataset in python using train_test_split()

This is the format of the dataset enter image description here This is my code: import numpy as np import matplotlib.pyplot as plt import pandas as pd #Importing the dataset dataset1 = pd.read_csv('DATASETS/movielens movie…

python regression artificial-intelligence train-test-split

asked Oct 25 '20 at 08:36

sonus vareed

votes

0 answers

train_test_slplit() function behavior

How can the values in X become this in X_train after train_test_split() function? How can I avoid that?

data-mining train-test-split

asked Oct 01 '20 at 22:51

user12239061

votes

2 answers

What is right time to perform train_test_split when building a model with text and categorical features?

I am trying to train a model which takes a mixture of numerical, categorical and text features. My question is which one of the following should I do for vectorizing my text and categorical features? I split my data into train,cv and test for…

python machine-learning data-science countvectorizer train-test-split

asked Sep 30 '20 at 17:04

Sandeep Maurya

votes

1 answer

How to create a train_test_split based on a conditional in python

I know how to utilize a basic train_test_split: from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=123) However, what if I want to divide my training and…

python scikit-learn train-test-split

asked Sep 05 '20 at 20:31

bismo

1,257
1
16
36

votes

2 answers

Should I split the only dataset to a train and test or I can use whole of it for regression problem?

In Kaggle competitions, we have a train and test dataset. So we usually develop a model on the training dataset and evaluate it with a test dataset that is unseen for the algorithm. I was wondering what is the best method for validation of a…

regression data-science train-test-split

asked Sep 04 '20 at 03:59

Med

votes

0 answers

How do i test multiple test/train splits

I am trying to test several train/test splits when running my lstm model and then display the results like the grapgh below i.e. train 80 test 20 train 70 test 30 train 60 test 40 train 50 test 50 mean How best can i do this in python (test…

python matplotlib train-test-split

asked Aug 31 '20 at 20:55

Tamarie

votes

1 answer

How to test accuracy from images that were not in the dataset

I am using train_test_split to train and test my data this is an interesting concept to divide the data into training and test, but what if I want to load some data that wasn't in the test data? My problem is that train_test_split treats data…

python machine-learning scikit-learn svm train-test-split

asked Aug 21 '20 at 14:38

user11597888

votes

0 answers

I keep getting a value error in the train test split function despite trying to adjust the params

Here is what I keep getting and I can't figure out why. I've adjusted the params but to no avail. tst=[list of…

pandas train-test-split

asked Aug 20 '20 at 09:35

WITCOHE

votes

1 answer

Return the index of selected test set Python

I'm trying to get the index of which data is selected by test data. First I use train-test-split for my data A = [[1,2],[3,4],[6,2],[3,4]] y = [1,0,0,1] from sklearn.model_selection import train_test_split A_train, A_test,y_train,y_test =…

python scikit-learn train-test-split

asked Aug 19 '20 at 07:44

Martin

votes

0 answers

How can I do a train_test_split in sklearn but limit/specify the output according to a certain member of a column? Closed

I am training a model to do weather data prediction. I found a method on github that does it pretty well with stuff like SVM and SVC. It uses a dataset that basically looks like this, Dhaka is a city/station name Station Yea Month…

python-3.x scikit-learn train-test-split

asked Aug 15 '20 at 12:15

CatVI

votes

1 answer

what is the best way to split data and why?

I am taking an online course in deep learning. they used the following code for determining train, validation and test data: (The shuffling step is before it which I did not write here) samples_count = shuffled_inputs.shape[0] # Count the samples…

tensorflow deep-learning train-test-split

asked Jul 31 '20 at 20:26

user12892310

votes

1 answer

How to best determine the accuracy of a model? Repeated train/test splits or cv?

I'm creating a classifier that takes vectorized book text as input and as output predicts whether the book is "good" or "bad". I have 40 books, 27 good and 13 bad. I split each book into 5 records (5 ten-page segments) to increase the amount of…

python machine-learning scikit-learn cross-validation train-test-split

asked Jul 23 '20 at 04:26

rbb

votes

1 answer

How toTrain_test split with OneHot encoded Data?

I am dealing with unbalanced data and trying to improve my model by using stratified data. The problem is that I am unsure how to do so exactly. Everything I have tried so far doesn't change anything. It should be something like this: X_train,…

python keras scikit-learn one-hot-encoding train-test-split

asked Jul 22 '20 at 17:35

Kirk1746

Prev 1 2 3

…

28 29 Next