Questions tagged [train-test-split]

Questions with this tag are about how to split the machine learning data set into random train and test subsets.

Questions with this tag are about how to split the machine learning data set into random train and test subsets.

In particular questions with this tag can be aimed at understanding better how to split the data with the scikit-learn functionality. In scikit-learn a random split into training and test sets can be quickly computed with the train_test_split helper function.

Reference: https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html

428 questions
-1
votes
1 answer

Train/Test Datasets in Machine Learning

I just have a general question: In a previous job, I was tasked with building a series of non-linear models to quantify the impact of certain factors on the number of medical claims filed. We had a set of variables we would use in all models (eg:…
user2813606
  • 797
  • 2
  • 13
  • 37
-1
votes
2 answers

Does scikit-learn train_test_split preserve relationships?

I am trying to understand this code. I do not understand how if you do: x_validation, x_test, y_validation, y_test = train_test_split(x_validation_and_test, y_validation_and_test... you can later do: (len(x_validation[y_validation == 0]) surely…
schoon
  • 2,858
  • 3
  • 46
  • 78
-1
votes
1 answer

Extract smaller table from pivot table pandas

I want to split the following pivot table into training and testing sets (to evaluate recommendation system), and was thinking of extracting two tables with non-overlapping indices (userID) and column values (ISBN). How can I split it properly?…
Helen Grey
  • 439
  • 6
  • 16
-1
votes
1 answer

how can I split matrix into training testing data whilst ensuring there is at least one value present in the rows and columns of the training matrix?

I want to randomly split a sparse matrix into training and testing data of the same dimensions whilst ensuring there are no columns or rows full of zeros in the training set. For my algorithms to work i need at least one value in each row and column…
-1
votes
2 answers

Test Train Split : error

how can i split my df : X=Final_df.drop('survived',axis=1) Y=Final_df['survived'] X_train,X_test,Y_train,Y_test=train_test_split(X,Y,test_size=0.3,random_state=123 ) logreg=LogisticRegression() logreg.fit(X_train,Y_train) train,test =…
-2
votes
1 answer

Create X train and Y Train for CSV dataset in Python

i would like to ask about creating x_train, y_train and x_test, y_test on CSV has been split into two between data_train.csv and data_test.csv
-2
votes
1 answer

Sorting train_test_split data by numpy array

I want to split the following numpy arrays for training and testing: X, y and qid X is a set of featurized documents - shape: (140, 105) qid is a set of query identifiers for each document - shape: (140,) y is a set of labels for each (X, qid) pair…
krakken
  • 9
  • 5
-2
votes
1 answer

What is different between train and test

from sklearn.model_selection import train_test_split x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size=0.2, random_state=87) plt.scatter(x_train[:, 0], x_train[:,1], c=y_train) Can someone explain to me about the code, what is…
-2
votes
1 answer

stratified 5-fold cross validation for continuous-value taregt The least populated class in y has only 1 member, which is too few

For this code: #x_train, x_val, y_train, y_val=train_test_split(x,y,test_size=0.3, random_state=42) x_train, x_val, y_train, y_val = train_test_split(x, y, test_size=0.3, random_state=42, stratify=y) train = [x_train, y_train] I get the following…
-2
votes
1 answer

Train/test division for each month in time series in python

I have time series data. Instead of using the first 80% of the data for training and the remaining 20% for testing, I want to split every month in that manner. The dataset contains multiple years of data. For every month I want to perform the split.…
Herwini
  • 371
  • 1
  • 19
-2
votes
1 answer

Expected 2D array, got 1D array instead. How to solve it in Linear Regression?

I don't understand why this error has occurred. I am doing exactly what my instructor is doing like. Please give me some solution or told me where is the mistake. Thank you.This is my code
-2
votes
1 answer

How to train test split and cross validation

I still confuse about data validation workflow. As I understand, when I get a dataset, I split the data into two parts, training set and test set, using train_test_split. Then, I perform cross_val_score orcross_val_predict on training set for model…
-2
votes
1 answer

Distribution of training, validation, and test set?

I want to ask about the distribution of train, validation, and test set? lets assume, i want to make a binary resnet classifier with two class of 'cat' and 'dog'. Assume the name of the image each class is: cat: a, b, c, d, e dog: f, g, h, i,…
Faiz
  • 21
  • 1
  • 4
-2
votes
3 answers

Using train_test_split over a list of dataframes

I have a 12 feature data frames named as X[0], X[1]... till X[11] and corresponding to it 12 response data frames as y[0] to y[11]. I need to split them into train and test data frames using the train_test_split function. As this processes empty…
-2
votes
1 answer

Iris data set split function not compiling?

I am attempting to randomly split 2 data sets (numpy arrays) using the train_test_split function but for some reason my code is not compiling. # Iris data set, hello world of Machine Learning #classes: possible outcomes #label: for each data point,…
1 2 3
28
29