2

R Version: R version 3.5.1 (2018-07-02)

H2O cluster version: 3.20.0.2

The dataset used here is available on Kaggle (Home credit risk). Prior to using h2o automl, the necessary treatment of missing values and selection of relevant categorical variables has already been carried out. Can you assist me in figuring out what is the underlying cause for this error? Thanks

Code:

h2o.init()
 h2o.no_progress()
 # y_train_processed_tbl is the target variable
 # x_train_processed_tbl is the remaining data post dealing with Missing 
 #  values
 data_h2o <- as.h2o(bind_cols(y_train_processed_tbl, x_train_processed_tbl))
 splits_h2o <- h2o.splitFrame(data_h2o, ratios = c(0.7, 0.15), seed = 1234)
 train_h2o <- splits_h2o[[1]]
 valid_h2o <- splits_h2o[[2]]
 test_h2o  <- splits_h2o[[3]]

 y <- "TARGET"
 x <- setdiff(names(train_h2o), y)

 automl_models_h2o <- h2o.automl(x = x,y = y,
 training_frame    = train_h2o, validation_frame  = valid_h2o,
 leaderboard_frame = test_h2o,
 max_runtime_secs  = 90
 )

 automl_leader <- automl_models_h2o@leader
 # Error in performance_h2o 
 performance_h2o <- h2o.performance(automl_leader, newdata = test_h2o)


ERROR: Unexpected HTTP Status code: 404 Not Found

water.exceptions.H2OKeyNotFoundArgumentException
 [1] "water.exceptions.H2OKeyNotFoundArgumentException: Object 'dummy' not 
 found in function: predict for argument: model"
 [2] "    water.api.ModelMetricsHandler.score(ModelMetricsHandler.java:235)"  
 [3] "    sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)"                                                    
 [4] "    sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)"                                                    
 [5] "    sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)"                                                
 [6] "    java.lang.reflect.Method.invoke(Unknown Source)"                                                                
 [7] "    water.api.Handler.handle(Handler.java:63)"                                                                      
 [8] "    water.api.RequestServer.serve(RequestServer.java:451)"                                                          
 [9] "    water.api.RequestServer.doGeneric(RequestServer.java:296)"                                                      
[10] "    water.api.RequestServer.doPost(RequestServer.java:222)"                                                         
[11] "    javax.servlet.http.HttpServlet.service(HttpServlet.java:755)"                                                   
[12] "    javax.servlet.http.HttpServlet.service(HttpServlet.java:848)"                                                   
[13] "    org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:684)"                                         
[14] "    org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:503)"                                     
[15] "    org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1086)"                             
[16] "    org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:429)"                                      
[17] "    org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1020)"                              
[18] "    org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)"                                  
[19] "    org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)"                          
[20] "    org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)"                                
[21] "    water.JettyHTTPD$LoginHandler.handle(JettyHTTPD.java:197)"                                                      
[22] "    org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)"                          
[23] "    org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)"                                
[24] "    org.eclipse.jetty.server.Server.handle(Server.java:370)"                                                        
[25] "    org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:494)"                 
[26] "    org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)"                  
[27] "    org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:982)"                       
[28] "    org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1043)"       
[29] "    org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:865)"                                               
[30] "    org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)"                                          
[31] "    org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)"                         
[32] "    org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)"                   
[33] "    org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)"                               
[34] "    org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)"                                
[35] "    java.lang.Thread.run(Unknown Source)"                                                                           



Error in .h2o.doSafeREST(h2oRestApiVersion = h2oRestApiVersion, urlSuffix = 
page,  : 

ERROR MESSAGE:

Object 'dummy' not found in function: predict for argument: model
Ankit
  • 23
  • 5
  • are you able to run the follwoing automl example code from the documentation without any issues: http://docs.h2o.ai/h2o/latest-stable/h2o-docs/automl.html#code-examples ? – Lauren Aug 15 '18 at 03:30
  • @Lauren, Yes the automl example code from the documentation runs without any issues. – Ankit Aug 15 '18 at 04:56
  • @divibisan, I understand that there might be some typo in the URL, however if that is the case it is a link generated by H2O internally. – Ankit Aug 15 '18 at 05:58
  • I just wanted to comment that you do not need to impute missing data, nor encode categorical variables when using H2O. It is performed automatically. You may have worse performance if you dummy/one-hot encode your categorical columns manually. I would not recommend it. – Erin LeDell Aug 15 '18 at 17:34
  • 1
    @ErinLeDell Thanks for the guidance. Just to clarify I have only removed the categorical variables with a very few levels. The idea for these treatments was carried out initially as I was trying some other models prior to using H2O. – Ankit Aug 15 '18 at 17:58
  • @Ankit Ok, great... just checking! – Erin LeDell Aug 15 '18 at 18:00

2 Answers2

2

The issue here is that you only gave AutoML 90 seconds to run, so it did not have time to train even one model. In the next stable release of H2O, the error message will be gone and instead you will simply get a Leaderboard with no rows (we are fixing this so that it's handled more gracefully).

Rather than using max_runtime_secs = 90, you could increase that to something much larger (the default is 3600 secs, or 1 hour). Alternatively you can specify the number of models you want instead by setting max_models = 20, for example.

If you do use max_models, I'd recommend setting max_runtime_secs to something large (e.g. 999999999) so that you don't run out of time. The AutoML process will stop when it reaches the first of max_models or max_runtime_secs.

I posted a similar answer here.

Erin LeDell
  • 8,704
  • 1
  • 19
  • 35
0

My code was working fine, then I tweaked it and got the same error.

To fix it, instead of using automl_models_h2o@leader to save the leader for predictions/performance, save the leader using h2o.getModel().

Change your automl_leader initialization:

...

# get model name from list
automl_models_h2o@leaderboard 

# change MODEL_NAME_HERE to a model name from your leaderboard list.
automl_leader <- h2o.getModel("MODEL_NAME_HERE") 

performance_h2o <- h2o.performance(automl_leader, newdata = test_h2o)

...
agentcurry
  • 2,405
  • 3
  • 17
  • 21