Is there a supported way to get list of features used by a H2O model during its training?

Question

This is my situation. I have over 400 features, many of which are probably useless and often zero. I would like to be able to:

train an model with a subset of those features
query that model for the features actually used to build that model
build a H2OFrame containing just those features (I get a sparse list of non-zero values for each row I want to predict.)
pass this newly constructed frame to H2OModel.predict() to get a prediction

I am pretty sure what found is unsupported but works for now (v 3.13.0.341). Is there a more robust/supported way of doing this?

model._model_json['output']['names']

The response variable appears to be the last item in this list.

In a similar vein, it would be nice to have a supported way of finding out which H2O version that the model was built under. I cannot find the version number in the json.

score 4 · Accepted Answer · edited Sep 11 '22 at 00:35

4

If you want to know which feature columns the model used after you have built a model you can do the following in python:

my_training_frame = your_model.actual_params['training_frame']

which will return some frame id

and then you can do

col_used = h2o.get_frame(my_training_frame)
col_used

EDITED (after comment was posted)

To get the columns use: col_used.columns

Also, a quick way to check the version of a saved binary model is to try and load it into h2o, if it loads it is the same version of h2o, if it isn't you will get a warning.

you can also open the saved model file, the first line will list the version of H2O used to create it.

For a model saved as a mojo you can look at the model.ini file. It will list the version of H2O.

edited Sep 11 '22 at 00:35

Laurel

5,965
14
31
57

answered Jul 17 '17 at 22:24

Lauren

5,640
1
13
19

col_used.col_names is the final step I was looking for. Thanks! h2o.init doesn't tell me the version the model was built with. That's what I was looking for. – Clem Wang Jul 18 '17 at 03:01
1

After loading my trained model and using the code above, I'm getting "TypeError: 'property' object has no attribute '__getitem__'" – Ege Oct 23 '19 at 08:59
hi @Ege can you create a separate stackoverflow question with reproducible code, that will make it easier for folks to debug your issue. thanks! – Lauren Oct 24 '19 at 16:23
Accessing `training_frame` won't work if it was removed after training, if the h2o instance was restarted, or the model has been archived and re-imported (stored on disk, in a code repo, data lake connected to MLflow, etc). – mirekphd Jun 16 '21 at 09:28

Is there a supported way to get list of features used by a H2O model during its training?

1 Answers1

Linked