Questions tagged [train-test-split]

Questions with this tag are about how to split the machine learning data set into random train and test subsets.

Questions with this tag are about how to split the machine learning data set into random train and test subsets.

In particular questions with this tag can be aimed at understanding better how to split the data with the scikit-learn functionality. In scikit-learn a random split into training and test sets can be quickly computed with the train_test_split helper function.

Reference: https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html

428 questions
0
votes
1 answer

GridSearchCV - which data should be used?

I am currently working on a binary classification problem with about 2000 data points in the training set and I wonder if I should use the whole training set for gridsearch or if I should do a split first to generate validation data. I have the…
0
votes
0 answers

Error zsh: killed when running python program that loads a txt file

So I need to split a txt file into a testing and a training file (also txt). I've run the code below for a smaller data set and it works perfectly. But it fails when I try to load the complete data set (3gb) and get a zsh:killed. Is there any way to…
0
votes
1 answer

Type error:Singleton array while trying to split the dataset in python using train_test_split()

This is the format of the dataset enter image description here This is my code: import numpy as np import matplotlib.pyplot as plt import pandas as pd #Importing the dataset dataset1 = pd.read_csv('DATASETS/movielens movie…
0
votes
0 answers

train_test_slplit() function behavior

How can the values in X become this in X_train after train_test_split() function? How can I avoid that?
user12239061
0
votes
2 answers

What is right time to perform train_test_split when building a model with text and categorical features?

I am trying to train a model which takes a mixture of numerical, categorical and text features. My question is which one of the following should I do for vectorizing my text and categorical features? I split my data into train,cv and test for…
0
votes
1 answer

How to create a train_test_split based on a conditional in python

I know how to utilize a basic train_test_split: from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=123) However, what if I want to divide my training and…
bismo
  • 1,257
  • 1
  • 16
  • 36
0
votes
2 answers

Should I split the only dataset to a train and test or I can use whole of it for regression problem?

In Kaggle competitions, we have a train and test dataset. So we usually develop a model on the training dataset and evaluate it with a test dataset that is unseen for the algorithm. I was wondering what is the best method for validation of a…
Med
  • 1
  • 1
0
votes
0 answers

How do i test multiple test/train splits

I am trying to test several train/test splits when running my lstm model and then display the results like the grapgh below i.e. train 80 test 20 train 70 test 30 train 60 test 40 train 50 test 50 mean How best can i do this in python (test…
Tamarie
  • 125
  • 2
  • 6
  • 18
0
votes
1 answer

How to test accuracy from images that were not in the dataset

I am using train_test_split to train and test my data this is an interesting concept to divide the data into training and test, but what if I want to load some data that wasn't in the test data? My problem is that train_test_split treats data…
user11597888
0
votes
0 answers

I keep getting a value error in the train test split function despite trying to adjust the params

Here is what I keep getting and I can't figure out why. I've adjusted the params but to no avail. tst=[list of…
WITCOHE
  • 1
  • 2
0
votes
1 answer

Return the index of selected test set Python

I'm trying to get the index of which data is selected by test data. First I use train-test-split for my data A = [[1,2],[3,4],[6,2],[3,4]] y = [1,0,0,1] from sklearn.model_selection import train_test_split A_train, A_test,y_train,y_test =…
Martin
  • 41
  • 2
  • 6
0
votes
0 answers

How can I do a train_test_split in sklearn but limit/specify the output according to a certain member of a column? Closed

I am training a model to do weather data prediction. I found a method on github that does it pretty well with stuff like SVM and SVC. It uses a dataset that basically looks like this, Dhaka is a city/station name Station Yea Month…
CatVI
  • 45
  • 8
0
votes
1 answer

what is the best way to split data and why?

I am taking an online course in deep learning. they used the following code for determining train, validation and test data: (The shuffling step is before it which I did not write here) samples_count = shuffled_inputs.shape[0] # Count the samples…
user12892310
0
votes
1 answer

How to best determine the accuracy of a model? Repeated train/test splits or cv?

I'm creating a classifier that takes vectorized book text as input and as output predicts whether the book is "good" or "bad". I have 40 books, 27 good and 13 bad. I split each book into 5 records (5 ten-page segments) to increase the amount of…
0
votes
1 answer

How toTrain_test split with OneHot encoded Data?

I am dealing with unbalanced data and trying to improve my model by using stratified data. The problem is that I am unsure how to do so exactly. Everything I have tried so far doesn't change anything. It should be something like this: X_train,…