2

I am using GBM model, and I wanna compare to other machine learning methods. I run with 5 folds. As I knew, they will separate the data into 5 folds, and chose one of them for the testing and the others for training. How to get 5 folds data from gbm of H2o lib?

I run it with Python language.

folds = 5
cars_gbm = H2OGradientBoostingEstimator(nfolds = folds, seed = 1234)
Subbu VidyaSekar
  • 2,503
  • 3
  • 21
  • 39
cnp
  • 339
  • 2
  • 11

1 Answers1

2

There's two ways:

  1. You can create and specify the folds manually.
  2. You can ask H2O to save the fold indexes (for each row, which fold ID does it belong to?) and return them as a single-column of data, by setting keep_cross_validation_fold_assignment=True.

Here are some code examples:

import h2o
from h2o.estimators import *

h2o.init()

# Import cars dataset
cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv")
cars["economy_20mpg"] = cars["economy_20mpg"].asfactor()
x = ["displacement","power","weight","acceleration","year"]
y = "economy_20mpg"
nfolds = 5

First way:

# Create a k-fold column and append to the cars dataset
# Or you can use an existing fold id column
cars["fold_id"] = cars.kfold_column(n_folds=nfolds, seed=1)

# Train a GBM
cars_gbm = H2OGradientBoostingEstimator(seed=1, fold_column = "fold_id",
              keep_cross_validation_fold_assignment=True)
cars_gbm.train(x=x, y=y, training_frame=cars)

# View the fold ids (identical to cars["fold_id"])
print(cars_gbm.cross_validation_fold_assignment())

Second way:

# Train a GBM & save fold IDs
cars_gbm = H2OGradientBoostingEstimator(seed=1, nfolds=nfolds,
              keep_cross_validation_fold_assignment=True)
cars_gbm.train(x=x, y=y, training_frame=cars)

# View the fold ids
print(cars_gbm.cross_validation_fold_assignment())
Erin LeDell
  • 8,704
  • 1
  • 19
  • 35
  • After getting cross_validation_fold_assignment(), do you know how to use it for another model. For example, I will re-run with gbm (change parameters such as number of trees, topping_round, ...etc). Thanks, – cnp Nov 11 '20 at 16:29
  • Answered here: https://stackoverflow.com/questions/64790872/how-to-reuse-cross-validation-fold-assignment-with-gbm-in-h2o-library-with-pyt – Erin LeDell Nov 11 '20 at 18:36