Questions tagged [train-test-split]

Questions with this tag are about how to split the machine learning data set into random train and test subsets.

In particular questions with this tag can be aimed at understanding better how to split the data with the scikit-learn functionality. In scikit-learn a random split into training and test sets can be quickly computed with the train_test_split helper function.

Reference: https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html

428 questions

votes

1 answer

Splitting train test sets for Node2vec link prediction in Stellargraph

I'm trying to understand how to use Stellargraph's EdgeSplitter class. In particular, the examples on the documentation for training a link prediction model based on Node2Vec splits the graph in the following parts: Distrution of samples across…

asked Aug 27 '20 at 10:26

Jaime Oliver Huidobro

votes

1 answer

scikit learn test_data_split: ValueError: Found input variables with inconsistent numbers of samples:[4999, 5000]

Here is my code print(len(image_dataset.data)) print(len(phylum_target)) X_train, X_test, y_train, y_test = train_test_split(image_dataset.data, phylum_target, test_size=0.2,random_state=109) And here is output and Error 5000 5000 Traceback (most…

python scikit-learn train-test-split

asked Aug 26 '20 at 02:42

Changwan Seo

votes

0 answers

scikit learn train_test_split() behaving splitting data unexpectedly

I'm facing this issue where sklearn's train_test_split() is dividing data sets abruptly in case of large data sets. I'm trying to load the entire data set of 118 MB, and it is assigning test data less than 10 times of what is expected of code. Case…

python scikit-learn python-3.6 python-3.7 train-test-split

asked Jul 08 '20 at 18:01

Anas Khan

votes

1 answer

Train and test split set using ImageDataGenerator and flow

I'm trying to make a network using augmentation. First I use ImageDataGenerator with validation_split=0.2. train_generator = ImageDataGenerator( rotation_range=90, zoom_range=0.15, width_shift_range=0.2, height_shift_range=0.2, …

keras data-augmentation train-test-split

asked Jun 22 '20 at 01:25

Thiago xavier rocha de souza

votes

2 answers

Dimensional problem in using train test split

from sklearn.model_selection import train_test_split predictors=data.drop(['target'],axis=1) targets=data['target'] train_x,test_x,train_y,test_y=train_test_split(predictors,targets,test_size=0.2,random_state=0) shape of train_x is…

python jupyter-notebook train-test-split

asked Jun 03 '20 at 17:44

John Michaels

votes

1 answer

NameError: name 'skimage' is not defined

im trying to figure out how to use SVM for image classification using images from my own dataset, to which im using the notebook from his link: https://github.com/whimian/SVM-Image-Classification. The problem is that, for whatever other project i…

python svm scikit-image train-test-split

asked May 02 '20 at 22:23

user11597888

votes

1 answer

Cannot impute 1D array with fit_transform from sklearn library (split-test)

I'm trying to impute 1D array with shape (14599,) with simple imputer with most_frequent strategy but it said it expected 2D array, i already tried reshaping it (-1,1) and (1,-1) but its error ValueError: could not broadcast input array from shape…

python arrays numpy scikit-learn train-test-split

asked Mar 01 '20 at 12:25

random student

votes

1 answer

Error message when I try to do train test split on credit card default data

I tried to do a train test split on credit card default data from https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients# This is my code: import sklearn import pandas as pd data = pd.read_excel("default of credit card clients.xls",…

python arrays train-test-split

asked Dec 26 '19 at 20:11

Zoran Pašić

votes

2 answers

How to split data into train and test keeping in mind the groupby column in pandas?

I would like to split the data set into test and train dataset in the ratio 20:80. However, while splitting, I do not want to split in a manner that 1 S_Id value has few data points in train and other data points in test. I have a dataset as: S_Id …

python pandas train-test-split

asked Jul 19 '19 at 19:30

Jupyter

votes

1 answer

How is train_test_split with test_size=0 affecting the data?

I was using train_test_split in my code and then wanted to change it to cross validation, but something strange is hapenning. train, test = train_test_split(data, test_size=0) x_train = train.drop('CRO', axis=1) y_train = train['CRO'] scaler =…

python machine-learning cross-validation train-test-split

asked Apr 30 '19 at 16:24

SlimakSlimak

votes

3 answers

Train test split based on a column values - sequentially

i have a data frame as below df = pd.DataFrame({"Col1": ['A','B','B','A','B','B','A','B','A', 'A'], "Col2" : [-2.21,-9.59,0.16,1.29,-31.92,-24.48,15.23,34.58,24.33,-3.32], "Col3" :…

python-3.x pandas train-test-split

asked Apr 22 '19 at 14:39

Shijith

4,602
2
20
34

votes

1 answer

PySpark randomSplit vs SkLearn Train Test Split - Random Seed Question

Let's say I have a pandas dataframe and apply sklearn.model_selection.train_test_split with the random_seed parameter set to 1. Let's say I then take the exact same pandas dataframe and create a Spark Dataframe with an instance of SQLContext. If I…

apache-spark scikit-learn pyspark train-test-split

asked Mar 31 '19 at 05:15

Odisseo

votes

2 answers

Order between using validation, training and test sets

I am trying to understand the process of model evaluation and validation in machine learning. Specifically, in which order and how the training, validation and test sets must be used. Let's say I have a dataset and I want to use linear regression.…

machine-learning cross-validation train-test-split

asked Jan 10 '19 at 10:36

david fdez

votes

1 answer

Splitting dataset for training and testing row wise

I want to split my dataset into training and test datasets based on years. The idea is to put the rows with years ranging form 2009-2017 in train dataset and the 2018 data in test dataset. Splitting the datasets was easy for the most part but my…

python machine-learning time-series train-test-split

asked Oct 10 '18 at 08:13

Fareen Walani

votes

1 answer

how to correct ImportError: cannot import name 'murmurhash3_32'

I installed scikit-learn library in python using the command pip install -U scikit-learn When I am trying to import the library or it's module like from sklearn.model_selection import train_test_split or simply import sklearn I am getting the…

python scikit-learn train-test-split

asked Jul 26 '18 at 12:01

Aklank Jain

1,002
1
13
21

Prev 1 2 3

…

28 29 Next