If this is creating a bottleneck, you should use a MOJO (or POJO) model for row-wise scoring instead of a model loaded into memory in the H2O cluster. This is what the MOJO/POJOs model format is designed for -- fast scoring without the need to convert between R data.frame and H2OFrame and also does not require running an H2O cluster. You can skip R altogether here.
Alternatively, if your pipeline requires R, you can still use the MOJO/POJO model from R via the h2o.predict_json()
function; it just requires you to convert your 1-row data.frame to a JSON string. That might alleviate the bottleneck somewhat, though the straight Java with MOJO/POJO model scoring method (above) will be the fastest.
Here's an example of what this looks like using a GBM MOJO file:
library(h2o)
model_path <- "~/GBM_model_python_1473313897851_6.zip"
json <- '{"V1":1, "V2":3.0, "V3":0}'
pred <- h2o.predict_json(model = model_path, json = json)
Here's how to construct the JSON string from a 1-row data.frame:
df <- data.frame(V1 = 1, V2 = 3.0, V3 = 0)
dfstr <- sapply(1:ncol(df), function(i) paste(paste0('\"', names(df)[i], '\"'), df[1,i], sep = ':'))
json <- paste0('{', paste0(dfstr, collapse = ','), '}')