0

I am trying to train a decision tree model by using h2o. I am aware that no specific library for decision trees exist in h2o.

This is the code when I use GBM algorithm in h2o, but I can use Decision Tree like this. Because there is no Decision Tree code in h2o.

GBMParametersV3 gbmParams = new GBMParametersV3();
gbmParams.trainingFrame = H2oApi.stringToFrameKey("train");
gbmParams.validationFrame = H2oApi.stringToFrameKey("test");

ColSpecifierV3 responseColumn = new ColSpecifierV3();
responseColumn.columnName = ATT_LABLE_IRIS;
gbmParams.responseColumn = responseColumn;

GBMV3 gbmBody = h2o.train_gbm(gbmParams);
...

So, how can I use Decision Tree algorithm in h2o?

liyuhui
  • 1,210
  • 12
  • 17
  • You can use a one-tree “not random forest” by tweaking the parameters so the random forest isn’t random and uses all the data. Don’t use GBM. – TomKraljevic Aug 29 '18 at 14:02
  • this question has been asked before here: https://stackoverflow.com/questions/50740316/implementing-a-decision-tree-using-h2o – Lauren Aug 29 '18 at 14:59
  • As you said, I should set ntrees=1, mtries=number of features and sample_rate=1 in random forest. So I can use it as decision tree. But in some situation, I DON'T know number of columns, what should I do in this case. – liyuhui Aug 30 '18 at 06:48
  • 1
    `mtries` is not the number of feature for the whole tree, but the random number of variables selected for evaluating each split. If don't know the number of columns to setup, you can go with `-1` which defaults to a third in regression. – Sixiang.Hu Aug 30 '18 at 10:51

1 Answers1

0

Based on PUBDEV-4324 - Expose Decision Tree as a stand-alone algo in H2O most straight forward way would be using GBM:

titanic_1tree = h2o.gbm(x = predictors, y = response, 
                        training_frame = titanicHex,
                        ntrees = 1, min_rows = 1, sample_rate = 1,            
                        col_sample_rate = 1,
                        max_depth = 5,
                        seed = 1)

which creates a decision tree maximum 5 splits deep (max_depth = 5) on titanic dataset (available here: https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/titanic.csv)

Starting with release 3.22.0.1 (Xia) it's possible to extract tree structures from H2O models:

titanicH2oTree = h2o.getModelTree(model = titanic_1tree, tree_number = 1)
topchef
  • 19,091
  • 9
  • 63
  • 102