I have a dataset with 100
samples, I want to split it into 75%
, 25%
, 25%
for both Train Validate, and Test respectively, then I want to do that again with different ratios such as 80%
, 10%
, 10%
.
For this purpose, I was using the code
down, but I think that it's not splitting the data correctly on the second step, because it will split the data from 85%
to (85% x 85%)
, and (15% x 15%)
.
My question is that:
Is there a nice clear way to do the splitting in the correct way for any given ratios?
from sklearn.model_selection import train_test_split
# Split Train Test Validate
X_, X_val, Y_, Y_val = train_test_split(X, Y, test_size=0.15, random_state=42)
X_train, X_test, Y_train, Y_test = train_test_split(X_, Y_, test_size=0.15, random_state=42)