3

This is my situation. I have over 400 features, many of which are probably useless and often zero. I would like to be able to:

  • train an model with a subset of those features
  • query that model for the features actually used to build that model
  • build a H2OFrame containing just those features (I get a sparse list of non-zero values for each row I want to predict.)
  • pass this newly constructed frame to H2OModel.predict() to get a prediction

I am pretty sure what found is unsupported but works for now (v 3.13.0.341). Is there a more robust/supported way of doing this?

model._model_json['output']['names']

The response variable appears to be the last item in this list.

In a similar vein, it would be nice to have a supported way of finding out which H2O version that the model was built under. I cannot find the version number in the json.

Clem Wang
  • 689
  • 8
  • 14

1 Answers1

4

If you want to know which feature columns the model used after you have built a model you can do the following in python:

my_training_frame = your_model.actual_params['training_frame']

which will return some frame id

and then you can do

col_used = h2o.get_frame(my_training_frame)
col_used

EDITED (after comment was posted)

To get the columns use: col_used.columns

Also, a quick way to check the version of a saved binary model is to try and load it into h2o, if it loads it is the same version of h2o, if it isn't you will get a warning.

you can also open the saved model file, the first line will list the version of H2O used to create it.

For a model saved as a mojo you can look at the model.ini file. It will list the version of H2O.

Laurel
  • 5,965
  • 14
  • 31
  • 57
Lauren
  • 5,640
  • 1
  • 13
  • 19
  • col_used.col_names is the final step I was looking for. Thanks! h2o.init doesn't tell me the version the model was built with. That's what I was looking for. – Clem Wang Jul 18 '17 at 03:01
  • 1
    After loading my trained model and using the code above, I'm getting "TypeError: 'property' object has no attribute '__getitem__'" – Ege Oct 23 '19 at 08:59
  • hi @Ege can you create a separate stackoverflow question with reproducible code, that will make it easier for folks to debug your issue. thanks! – Lauren Oct 24 '19 at 16:23
  • Accessing `training_frame` won't work if it was removed after training, if the h2o instance was restarted, or the model has been archived and re-imported (stored on disk, in a code repo, data lake connected to MLflow, etc). – mirekphd Jun 16 '21 at 09:28