5

I am trying to train a decision tree model using h2o. I am aware that no specific library for decision trees exist in h2o. But, h2o has an implemtation of random forest H2ORandomForestEstimator . Can we implement a decision tree in h2o by tuning certain input arguments of random forests ? Because we can do that in scikit module (a popular python library for machine learning)

Ref link : Why is Random Forest with a single tree much better than a Decision Tree classifier?

In scikit the code looks something like this

RandomForestClassifier(n_estimators=1, max_features=None, bootstrap=False)

Do we have a equivalant of this code in h2o ?

ishaan arora
  • 523
  • 8
  • 18

2 Answers2

7

you can use H2O's random forest (H2ORandomForestEstimator), set ntrees=1 so that it only builds one tree, set mtries to the number of features (i.e. columns) you have in your dataset and sample_rate =1. Setting mtries to the number of features in your dataset means the algo will randomly sample from all of your features at each level in the decision tree.

here is more information about mtries:http://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/algo-params/mtries.html

Lauren
  • 5,640
  • 1
  • 13
  • 19
  • what about if i want to use `categorical_encoding = 'enum'` in my RandomForestEstimator? I do not know the final number of features that the model will use for prediction due to categorical encoding of H2O. How can I set mtries in this case, so that the Model will use all features as u suggested? – dnks23 Aug 13 '18 at 13:24
  • please see: http://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/algo-params/categorical_encoding.html?highlight=categorical_encoding – Lauren Aug 13 '18 at 14:42
  • 1
    this does not answer my question. I asked: Due to `categorical_encoding='enum'` i do not know the number of features used for training. Therefore I can not set `mtries` - Parameter as you suggested. How to set `mtries` in this case? – dnks23 Aug 13 '18 at 15:52
  • set `mtries` to you number of original features in the dataset. Please read the linked description of `categorical_encoding='enum'` to understand why (i.e. the encoding leaves the dataset as is and provides a mapping from string to integers). – Lauren Aug 14 '18 at 14:28
  • 2
    if I set mtries to the number of predictors (21) I get an Error saying that mtries can be in interval [1,16[ but not 21. – dnks23 Aug 30 '18 at 06:33
  • @dnks23 how many predictor (columns) are there in your training dataset that you passed to your algorithm? if your training set only has 16 columns then you can at most specify mtries to 16, for example. – Lauren Aug 30 '18 at 15:42
  • As I said I have `len(train_data.columns) =21` predictors. If I set mtries to this number, I get the error.... – dnks23 Aug 31 '18 at 06:57
  • @dnks23 I am having the same problem. I have 24 columns but I get an error saying the interval is [1,21[. Were you able to solve it? – nicolezk Oct 06 '20 at 17:37
  • @dnks23 I found it out! When you use `mtries=-2` it uses all the available columns. When I did that, the algorithm gave a warning that it was dropping some constant columns. That's why the number of columns wasn't matching. – nicolezk Oct 06 '20 at 18:10
2

To add to Lauren's answer: based on PUBDEV-4324 - Expose Decision Tree as a stand-alone algo in H2O both DRF and GBM can do the job with GBM being marginally easier:

titanic_1tree = h2o.gbm(x = predictors, y = response, 
                        training_frame = titanicHex,
                        ntrees = 1, min_rows = 1, sample_rate = 1,            
                        col_sample_rate = 1,
                        max_depth = 5,
                        seed = 1)

which creates a decision tree maximum 5 splits deep (max_depth = 5) on titanic dataset (available here: https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/titanic.csv)

Starting with release 3.22.0.1 (Xia) it's possible to extract tree structures from H2O models:

titanicH2oTree = h2o.getModelTree(model = titanic_1tree, tree_number = 1)
topchef
  • 19,091
  • 9
  • 63
  • 102