3

Running h2o.automl() returns a single model in leaderboard; however, when trying to access the actual model via @leader@model, the following error ensues:

Error in is.H2OFrame(x) : trying to get slot "metrics" from an object of a basic class ("NULL") with no slots

As well, when calling h2o.predict() on the leader model, got the error message:

Error in .h2o.doSafeREST(h2oRestApiVersion = h2oRestApiVersion, urlSuffix = page, : ERROR MESSAGE: Object 'dummy' not found in function: predict for argument: model

Model was run in the same session using h2o v3.20.0.2 in R.

Saurabh Chauhan
  • 3,161
  • 2
  • 19
  • 46
urganmax
  • 88
  • 1
  • 6
  • How big is your training set? I assume you ran for the default time period of 3600 seconds (1 hour)? – Erin LeDell Aug 08 '18 at 16:12
  • The training set has dimensions of 310 x 119886, and I ran it for 3600 seconds indeed. I'll try to extend that number and see. – urganmax Aug 09 '18 at 17:19

1 Answers1

4

I think what's happening is that you're not able to train a single model in one hour, so when you try to collect the leader model, it's trying to grab an incomplete model and you get an error. You don't have very many rows, but you have a really large number of columns.

Since it's hard to predict how long the model training will take, I'd use the max_models argument instead of limiting by time. Since AutoML will stop when it reaches the first of max_models or max_runtime_secs, I'd set max_runtime_secs to a very large number (e.g. 999999999) and then set max_models = 10 or whatever number you like.

Second, since you have very wide data, I'd recommend turning off the Random Forests and GBM models, and leaving the GLM and Deep Learning models. To do that, set exclude_algos = c("DRF", "GBM"). It will take a really long time to train tree-based models on 120k columns.

Another good option to consider is to first apply PCA or GLRM to your data to reduce the dimensionality to <500 columns and then you can include the tree-based models in the AutoML run.

Erin LeDell
  • 8,704
  • 1
  • 19
  • 35
  • @KingJulien Why are you commenting here for another issue? Your comment is not related to this question or my answer. This is the second issue you've posted on, please do not do this. – Erin LeDell Aug 30 '18 at 21:23
  • I wanted your help that's why I commented here. I will delete my comment. Thanks – King Julien Aug 30 '18 at 21:33
  • Hi @ErinLeDell, I am getting the same error but on a significantly smaller data set. My feature set is 10k rows x 20 columns. I'm trying to train a multi class classifier with `balance_classes = TRUE` with `stopping_metric = TRUE` and `sort_metric = TRUE`. I have also tried your trick of setting `max_runtime_secs` to a large value and setting `max_models`. My H2O version is 3.23.0.4532. I'm guessing balance_classes and multi class don't play nice? Or is there something else that might be going on? Thanks very much! – elvikingo Mar 13 '19 at 06:16
  • @elvikingo Both `stopping_metric` and `sort_metric` take a string as a value, they are not boolean. You are using a nightly version (can you re-try on the stable version 3.22?). If you can post a reproducible example on Stack Overflow or even better, file a JIRA ticket with the example, https://0xdata.atlassian.net/projects/PUBDEV, that would be super helpful. Thanks! – Erin LeDell Mar 13 '19 at 06:55
  • 1
    @ErinLeDell thanks for your response, Erin. I'm sorry I made a mistake while writing that comment. I definitely had `stopping_metric = "AUC"` and not TRUE. I changed to the stable version 3.22.1.6 and it seems to be working fine. Would you still like me to file a ticket for the nightly version? – elvikingo Mar 13 '19 at 23:34
  • @elvikingo If you have a reproducible example (maybe using the iris dataset for example), then yes, it would be great to file a JIRA ticket, thank you! – Erin LeDell Mar 14 '19 at 04:18