Repeated holdout method

Question

How can I make "Repeated" holdout method, I made holdout method and get accuracy but need to repeat holdout method for 30 times

There is my code for holdout method

[IN]

X_train, X_test, Y_train, Y_test = train_test_split(X, Y.values.ravel(), random_state=100)
model = LogisticRegression()
model.fit(X_train, Y_train)
result = model.score(X_test, Y_test)
print("Accuracy: %.2f%%" % (result*100.0))

[OUT]

Accuracy: 49.62%

I see many codes for repeated method but only for K fold cross, nothing for holdout method

Cross validation **is** holdout, it just does hold out in a smart way. — lejlot, Jun 21 '21 at 16:48
What that means to me? How can I do this holdout validation for 30 times? — raideR49, Jun 21 '21 at 16:50

score 0 · Answer 1 · answered Jul 08 '21 at 11:09

So to use a repeated holdout you could use the ShuffleSplit method from sklearn. A minimum working example (following the name conventions that you used) might be as follows:

from sklearn.modelselection import ShuffleSplit
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import make_classification

# Create some artificial data to train on, can be replace by your own data
X, Y = make_classification()

rs = ShuffleSplit(n_splits=30, test_size=0.25, random_state=100)
model = LogisticRegression()

for train_index, test_index in rs.split(X):
    X_train, Y_train = X[train_index], Y[train_index]    
    X_test, Y_test = X[test_index], Y[test_index]
    model.fit(X_train,Y_train)
    result = model.score(X_test, Y_test)
    print("Accuracy: %.2f%%" % (result*100.0))

n_splits determines how many time you would like to repeat the holdout. test_size deterimines the fraction of samples that is sampled as a test set. In this case 75% is sampled as train set, whereas 25% is sampled to your test set. For reproducible results you can set the random_state (any number suffices, as long as you use the same number consistently).

Repeated holdout method

1 Answers1