6

I have a rather simple question, but have not been able to find a documented solution anywhere.

I'm currently building a pipeline with H2O models and as part of the process I need to write some basic information about each trained model into a table.

Let's say I have something like:

model = H2ODeepLearningEstimator(...)
model.train(...)

After doing this, I want to pull the type of model from the model object. I.e, I am looking for something like:

model.getType()

which then returns a string "H2ODeepLearningEstimator" or equivalently "deeplearning" which H2O appears to use internally as the model type identifier. I would also like to get other details, such as whether it was a regression or classification model. I don't see a parameter where this information is exposed.

if I run model.save_model_details for example, I get:

H2ODeepLearningEstimator :  Deep Learning
Model Key:  Grid_DeepLearning_py_4_sid_a02a_model_python_1502450758585_2_model_0


ModelMetricsRegression: deeplearning
** Reported on train data. **

MSE: 19.5334650304
RMSE: 4.4196679774
MAE: 1.44489752843
RMSLE: NaN
Mean Residual Deviance: 19.5334650304

ModelMetricsRegression: deeplearning
** Reported on validation data. **
...
...

Presumably model.save_model_details builds up this summary from individual parameters. I would like to access these (and similar) parameters directly via the model object (for performance metrics this is possible via model.mse(), model.mae() etc.)

Karl
  • 5,573
  • 8
  • 50
  • 73
  • I guess you figured it out by now. You specify classification or regression with the distribution parameter in the model. Above you have the default, which is distribution='gaussian', so a regression task. – TinaW Apr 08 '18 at 14:55

4 Answers4

4

You can get some of the individual model metrics for your model based on training and/or validation data. Here is the code snippet:

import h2o
h2o.init(strict_version_check= False , port = 54345)
from h2o.estimators.deeplearning import H2ODeepLearningEstimator
model = H2ODeepLearningEstimator()
rows = [[1,2,3,4,0], [2,1,2,4,1], [2,1,4,2,1], [0,1,2,34,1], [2,3,4,1,0]] * 50
fr = h2o.H2OFrame(rows)
X = fr.col_names[0:4]

## Classification Model
fr[4] = fr[4].asfactor()
model.train(x=X, y="C5", training_frame=fr)
print('Model Type:', model.type)
print('logloss', model.logloss(valid = False))
print('Accuracy', model.accuracy(valid = False))
print('AUC', model.auc(valid = False))
print('R2', model.r2(valid = False))
print('RMSE', model.rmse(valid = False))
print('Error', model.error(valid = False))
print('MCC', model.mcc(valid = False))

## Regression Model
fr = h2o.H2OFrame(rows)
model.train(x=X, y="C5", training_frame=fr)
print('Model Type:', model.type)
print('R2', model.r2(valid = False))
print('RMSE', model.rmse(valid = False))

Note: As I did not pass validation frame thats why I set valid = False to get training metrics. If you have passed validation metrics then you can set valid = True to get validation metrics as well.

If you want to see what is inside model object you can look at the json object as below:

model.get_params()
AvkashChauhan
  • 20,495
  • 3
  • 34
  • 65
3

The model type is stored in model.type().

You can see all the methods for a model by typing model. then the tab key in the IPython terminal. They are printed alphabetically and that's a good way to find what you're looking for (even if you don't know the exact method name). You can also search for "type" in the Python Module documentation and you'll find it that way as well.

Example:

import h2o
from h2o.estimators.gbm import H2OGradientBoostingEstimator
h2o.init()

# Import a sample binary outcome train/test set into H2O
train = h2o.import_file("https://s3.amazonaws.com/erin-data/higgs/higgs_train_10k.csv")
test = h2o.import_file("https://s3.amazonaws.com/erin-data/higgs/higgs_test_5k.csv")

# Identify predictors and response
x = train.columns
y = "response"
x.remove(y)

# For binary classification, response should be a factor
train[y] = train[y].asfactor()
test[y] = test[y].asfactor() 

# Train a GBM
model = H2OGradientBoostingEstimator(distribution="bernoulli", seed=1)
model.train(x=x, y=y, training_frame=train)

Check the model type:

In [3]: model.type
Out[3]: u'classifier'
Erin LeDell
  • 8,704
  • 1
  • 19
  • 35
1

h2o.algo gives you the model type. as for regression or classification, I don't know off the top of my head but it's their somewhere. Look on flow as its easier to see the parameter names their or do model. and scroll through until you see something that looks like it might have it.

jack
  • 102
  • 1
  • 6
0

(This is on the topic of the question's title so I think it's worth pointing out here. But it's a bit off topic of the actual question, which is referring to H2O binary models, not POJO and MOJO models.)

For H2O POJO and MOJO models, the method to use is getModelCategory().

See http://docs.h2o.ai/h2o/latest-stable/h2o-genmodel/javadoc/hex/genmodel/easy/EasyPredictModelWrapper.html#getModelCategory()

TomKraljevic
  • 3,661
  • 11
  • 14