Questions tagged [train-test-split]

Questions with this tag are about how to split the machine learning data set into random train and test subsets.

In particular questions with this tag can be aimed at understanding better how to split the data with the scikit-learn functionality. In scikit-learn a random split into training and test sets can be quickly computed with the train_test_split helper function.

Reference: https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html

428 questions

-1

votes

1 answer

Train/Test Datasets in Machine Learning

I just have a general question: In a previous job, I was tasked with building a series of non-linear models to quantify the impact of certain factors on the number of medical claims filed. We had a set of variables we would use in all models (eg:…

machine-learning train-test-split

asked Mar 26 '20 at 17:11

user2813606

-1

votes

2 answers

Does scikit-learn train_test_split preserve relationships?

I am trying to understand this code. I do not understand how if you do: x_validation, x_test, y_validation, y_test = train_test_split(x_validation_and_test, y_validation_and_test... you can later do: (len(x_validation[y_validation == 0]) surely…

python scikit-learn train-test-split

asked Dec 19 '19 at 15:08

schoon

2,858
3
46
78

-1

votes

1 answer

Extract smaller table from pivot table pandas

I want to split the following pivot table into training and testing sets (to evaluate recommendation system), and was thinking of extracting two tables with non-overlapping indices (userID) and column values (ISBN). How can I split it properly?…

python pandas pivot-table train-test-split

asked Nov 28 '19 at 01:45

Helen Grey

-1

votes

1 answer

how can I split matrix into training testing data whilst ensuring there is at least one value present in the rows and columns of the training matrix?

I want to randomly split a sparse matrix into training and testing data of the same dimensions whilst ensuring there are no columns or rows full of zeros in the training set. For my algorithms to work i need at least one value in each row and column…

python machine-learning cross-validation train-test-split

asked Apr 07 '19 at 14:45

Sophia Bouchama

-1

votes

2 answers

Test Train Split : error

how can i split my df : X=Final_df.drop('survived',axis=1) Y=Final_df['survived'] X_train,X_test,Y_train,Y_test=train_test_split(X,Y,test_size=0.3,random_state=123 ) logreg=LogisticRegression() logreg.fit(X_train,Y_train) train,test =…

python pandas dataframe scikit-learn train-test-split

asked Jul 21 '18 at 06:06

sathish kumar

-2

votes

1 answer

Create X train and Y Train for CSV dataset in Python

i would like to ask about creating x_train, y_train and x_test, y_test on CSV has been split into two between data_train.csv and data_test.csv

python scikit-learn train-test-split

asked Jun 19 '22 at 14:39

Rifaldy Tajrial

-2

votes

1 answer

Sorting train_test_split data by numpy array

I want to split the following numpy arrays for training and testing: X, y and qid X is a set of featurized documents - shape: (140, 105) qid is a set of query identifiers for each document - shape: (140,) y is a set of labels for each (X, qid) pair…

python arrays numpy scikit-learn train-test-split

asked Apr 06 '22 at 14:58

krakken

-2

votes

1 answer

What is different between train and test

from sklearn.model_selection import train_test_split x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size=0.2, random_state=87) plt.scatter(x_train[:, 0], x_train[:,1], c=y_train) Can someone explain to me about the code, what is…

scikit-learn model train-test-split

asked Jan 31 '22 at 04:22

洪啓善

-2

votes

1 answer

stratified 5-fold cross validation for continuous-value taregt The least populated class in y has only 1 member, which is too few

For this code: #x_train, x_val, y_train, y_val=train_test_split(x,y,test_size=0.3, random_state=42) x_train, x_val, y_train, y_val = train_test_split(x, y, test_size=0.3, random_state=42, stratify=y) train = [x_train, y_train] I get the following…

python machine-learning scikit-learn cross-validation train-test-split

asked Dec 02 '21 at 23:54

Mona Jalal

34,860
64
239
408

-2

votes

1 answer

Train/test division for each month in time series in python

I have time series data. Instead of using the first 80% of the data for training and the remaining 20% for testing, I want to split every month in that manner. The dataset contains multiple years of data. For every month I want to perform the split.…

python machine-learning train-test-split

asked May 18 '21 at 10:02

Herwini

-2

votes

1 answer

Expected 2D array, got 1D array instead. How to solve it in Linear Regression?

I don't understand why this error has occurred. I am doing exactly what my instructor is doing like. Please give me some solution or told me where is the mistake. Thank you.This is my code

sklearn-pandas train-test-split

asked Dec 28 '20 at 05:56

Jobaear Hossain

-2

votes

1 answer

How to train test split and cross validation

I still confuse about data validation workflow. As I understand, when I get a dataset, I split the data into two parts, training set and test set, using train_test_split. Then, I perform cross_val_score orcross_val_predict on training set for model…

python machine-learning data-science cross-validation train-test-split

asked Aug 27 '20 at 03:23

indyspace

-2

votes

1 answer

Distribution of training, validation, and test set?

I want to ask about the distribution of train, validation, and test set? lets assume, i want to make a binary resnet classifier with two class of 'cat' and 'dog'. Assume the name of the image each class is: cat: a, b, c, d, e dog: f, g, h, i,…

classification training-data train-test-split

asked Mar 03 '20 at 16:06

Faiz

-2

votes

3 answers

Using train_test_split over a list of dataframes

I have a 12 feature data frames named as X[0], X[1]... till X[11] and corresponding to it 12 response data frames as y[0] to y[11]. I need to split them into train and test data frames using the train_test_split function. As this processes empty…

python python-3.x scikit-learn sklearn-pandas train-test-split

asked Nov 15 '18 at 13:24

Batman

-2

votes

1 answer

Iris data set split function not compiling?

I am attempting to randomly split 2 data sets (numpy arrays) using the train_test_split function but for some reason my code is not compiling. # Iris data set, hello world of Machine Learning #classes: possible outcomes #label: for each data point,…

python machine-learning scikit-learn train-test-split

asked Apr 04 '18 at 00:59

Casale

Prev 1 2 3

…

29 Next