Questions tagged [train-test-split]

Questions with this tag are about how to split the machine learning data set into random train and test subsets.

In particular questions with this tag can be aimed at understanding better how to split the data with the scikit-learn functionality. In scikit-learn a random split into training and test sets can be quickly computed with the train_test_split helper function.

Reference: https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html

428 questions

votes

0 answers

R studio Knit error "incorrect number of dimensions"

I've encountered this knit issue with R studio. I have a dataset with dimension (543, 31) and I split it into train and test with: set.seed(1) train=sample(c(TRUE ,FALSE), nrow(dataset),rep=TRUE) test=(!train) y.test=y[test] And then I applied…

r train-test-split

asked Mar 15 '18 at 05:14

efsee

votes

1 answer

How to split and dataset into train and test and merge their corresponding "class" in R

I am using the wisconsin dataset which has two categorical columns IDs and class. In order to carry out classification I must drop these two columns from the dataframe and then split the dataset into train and test (80%:20%). I have this done but…

r classification knn train-test-split

asked Mar 10 '18 at 12:12

Naomi Breslin

votes

2 answers

KeyError when trying to randomize a column of a dataframe

Minimal Example: Consider this dataframe temp: temp = pd.DataFrame({"A":[1,2,3,4,5,6,7,8,9,10],"B":[2,3,4,5,6,7,8,9,10,11],"C":[3,4,5,6,7,8,9,10,11,12]}) >>> temp A B C 0 1 2 3 1 2 3 4 2 3 4 5 3 4 5 6 4 5 6 7 5…

python pandas numpy train-test-split

asked Feb 16 '18 at 07:42

Mooncrater

4,146
4
33
62

votes

1 answer

Behaviour of train_test_split() from Scikit-learn

I am curious how the train_test_split() method of Scikit-learn will behave in the following scenario: An imaginary dataset: id, count, size 1, 4, 8 2, 5, 9 3, 6, 0 say I would divide it into two separate sets like this (keeping 'id' in both): id,…

scikit-learn train-test-split

asked Dec 04 '17 at 13:48

NG.

votes

1 answer

Convert float value to integers in Pandas dataframe while ignoring null values

I have a two seperate csv files I read into a pandas dataframe. I've already done a bit of cleaning and joined the tables by their date column. I have another column called 'ExerciseTime' and converted the imported time format of the time of day…

python pandas train-test-split

asked Oct 02 '17 at 19:51

DEB

votes

1 answer

Create train and test variables from loaded arff file

I want perform multilabel classification. A have a dataset in arff format which I load. However I don't now how convert import data to X and y vectors in order to apply sklearn/train_test_split. How can I get X and y? data, meta =…

python arff multilabel-classification train-test-split

asked Sep 05 '17 at 15:10

msoares

votes

1 answer

Wrong train/test split strategy

The question is about a wrongly chosen strategy for train/test splitting in a RandomForest model. I know choosing the test set this way gives the wrong output but I would like to know why. (The model looks at previous days of data and tries to…

random-forest train-test-split

asked Aug 30 '17 at 08:21

DBSE

votes

1 answer

Train Test Split for a list of dataframes - Pandas

I have a list of DataFrames that I want to split into train and test sets. For a single DataFrame, I could do the following, Get the length of test split split_point = len(df)- 125 and then, train, test = df[0:split_point], df[split_point:] This…

python pandas dataframe train-test-split

asked Jul 19 '17 at 21:05

i.n.n.m

2,936
7
27
51

votes

1 answer

Problems with the random-state parameter on data splitting with sklearn

When I look for the random -state parameter in sklearn's documentation, this is what I find: random_state : int or RandomState Pseudo-random number generator state used for random sampling. I don't understand very well what it is. The accuracy…

python machine-learning scikit-learn train-test-split

asked Mar 24 '17 at 13:57

Borja Fernández Antelo

votes

0 answers

How to split the data in python and predict the value of next month

I have a Dataset, where I need to predict the Energy Consumption. I have the September data, and need to predict the October values. I need to predict the values of KWH for Oct. How do I write a python code, where September data would be my train…

python-3.x split predict python-datetime train-test-split

asked Mar 06 '17 at 10:00

Anagha

3,073
8
25
43

votes

1 answer

ValueError: bad input shape (60, 4) Iris dataset train_test_split

I received an input shape error when using train_test_split for iris. I don't understand why. I have tested other datasets. train_test_split should handle this shape. Any suggestions? Thanks # Decision Tree Classifier from sklearn import…

input shapes train-test-split

asked Mar 04 '17 at 17:33

Muten_Roshi

votes

2 answers

How to get the result auc using scikit

Hi i want to combine train/test split with a cross validation and get the results in auc. My first approach I get it but with accuracy. # split data into train+validation set and test set X_trainval, X_test, y_trainval, y_test =…

python-3.x scikit-learn cross-validation train-test-split

asked Jan 20 '17 at 17:05

xav

-1

votes

0 answers

Getting Value Error inconsistent number of samples on X_train, y_train even when the shapes of the X_train, y_train are same

most likely bug-> X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42) the the fit call to GridSearchCV-> gs_mnb.fit(X_train, y_train) here's the pipleline used in my code-> pipe_mnb = Pipeline([ ('vect',…

python machine-learning nlp train-test-split

asked Aug 07 '23 at 21:01

YuvrajSingh

-1

votes

3 answers

why Train/Test-split in ML?

I can't understand why we need to split dataset in machine learning. And why this train-test-split algorithm gives four parameters(x_train, x_test, y_train, y_test)? I see many videos and read some blogs, they explain a lot of reasons. No one agree…

machine-learning scikit-learn data-science train-test-split

asked Apr 25 '23 at 05:47

xr-adeel7

-1

votes

1 answer

How to remove cross-validation with train_test_split?

My code: X = data['text_with_tokeniz_lemmatiz'] y = data['toxic'] X_train, X_tmp, y_train, y_tmp = train_test_split(X, y, train_size=0.8, test_size=0.2, shuffle=False, random_state=12345) X_valid, X_test, y_valid, y_test = train_test_split(X_tmp,…

cross-validation train-test-split

asked Jan 09 '23 at 21:12

Kirill

Prev 1 2 3

…

28 29 Next