0

I want to change my code so that instead of this part:

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=100, test_size=0.2)

train_data = X_train.copy()
train_data.loc[:, 'target'] = y_train

test_data = X_test.copy()
test_data.loc[:, 'target'] = y_test


data_config = DataConfig(
    target=['target'], #target should always be a list. Multi-targets are only supported for 
    regression. Multi-Task Classification is not implemented
    continuous_cols=train_data.columns.tolist(),
    categorical_cols=[],
    normalize_continuous_features=True
)
trainer_config = TrainerConfig(
    auto_lr_find=True,
    batch_size=64,
    max_epochs=10,

)
optimizer_config = {'optimizer':'Adam', 'optimizer_params':{'weight_decay': 0, 'amsgrad': 
False}, 'lr_scheduler':None, 'lr_scheduler_params':{}, 
'lr_scheduler_monitor_metric':'valid_loss'}

model_config = NodeConfig(
    task="classification",
    num_layers=2,
    num_trees=512,
    learning_rate=1,
    embed_categorical=True,

)
tabular_model = TabularModel(
    data_config=data_config,
    model_config=model_config,
    optimizer_config=optimizer_config,
    trainer_config=trainer_config,
)

tabular_model.fit(train=train_data, test=test_data)

pred = tabular_model.predict(test_data)

pred['prediction'] = pred['prediction'].astype(int)
pred.loc[(pred['prediction'] >= 1 )] = 1

print_metrics(test_data['target'], pred["prediction"].astype('int'), tag="Holdout")

I want to Use the K fold method with k = 5 or 10.

Thank you for your advice. The complete code example that I have used method train_test_split is above.

zizi
  • 1
  • 1
  • what have you tried and where are you stuck ? – D.L Aug 13 '22 at 15:39
  • I have not used the k fold method before, so I don't know how to change my code in order to use k fold instead of train_test_split. I think this part is necessary `train_data = X_train.copy() train_data.loc[:, 'target'] = y_train test_data = X_test.copy() test_data.loc[:, 'target'] = y_test` – zizi Aug 13 '22 at 15:50
  • Does this answer your question? [Splitting a data set for K-fold Cross Validation in Sci-Kit Learn](https://stackoverflow.com/questions/58821599/splitting-a-data-set-for-k-fold-cross-validation-in-sci-kit-learn) – Ignatius Reilly Aug 13 '22 at 20:17
  • There's pertinent documentation in scikit, and material that you can search in the internet. Please, use SO only after you have done [a great deal of research](https://meta.stackoverflow.com/questions/261592/how-much-research-effort-is-expected-of-stack-overflow-users) yourself, you have tried something (that you can show), and you got stuck in a point were you can't actually solve the problem without asking other people to do the work/research for you. – Ignatius Reilly Aug 13 '22 at 20:22

1 Answers1

-1

Here is an example of the k-fold method:



import numpy as np
from sklearn.model_selection import train_test_split
from sklearn import datasets
from sklearn import svm

X, y = datasets.load_iris(return_X_y=True)
X.shape, y.shape


X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.4, random_state=0)

X_train.shape, y_train.shape

X_test.shape, y_test.shape


clf = svm.SVC(kernel='linear', C=1).fit(X_train, y_train)
clf.score(X_test, y_test)

result (in this example):

0.9666666666666667

The example is from here: https://scikit-learn.org/stable/modules/cross_validation.html

D.L
  • 4,339
  • 5
  • 22
  • 45
  • You just copypasted the example without reading the documentation you linked: That's an example of the standard (not k-fold) method. After the explanation on the advantages of using k-fold vs that example, it comes the [actual example on k-fold CV](https://scikit-learn.org/stable/modules/cross_validation.html#computing-cross-validated-metrics) – Ignatius Reilly Aug 13 '22 at 20:11
  • @IgnatiusReilly, that is why i referenced the link in the first instance and only added the example. because the docs are far more comprehensive. The OP specicically says "I want to change my code"... – D.L Aug 14 '22 at 06:50
  • But the code you posted **is not an example of k-fold** as you stated. Neither it is a suggestion on how to change the OP's code since your example uses **the same** kind of split the OP was using in his own code. – Ignatius Reilly Aug 14 '22 at 15:23