Questions tagged [h2o]

Use this tag for questions about the H2O in-memory machine learning platform. Where relevant, add language tags like [r], [python], [scala], or [java].

Best Practices

Always post a Minimal, Complete and Verifiable Example (MCVE) and provide the H2O version number and client type (Python, R, Flow, etc).

If your question is not code related, do not post to Stack Overflow (per Stack Overflow guidelines). If your question is algorithm related, post to Cross-Validated on Stack Exchange using the "h2o" tag. All other questions can be posted to the h2ostream Google group (please do not double-post).

Resources

1875 questions
6
votes
4 answers

Get model details from H2O model object

I have a rather simple question, but have not been able to find a documented solution anywhere. I'm currently building a pipeline with H2O models and as part of the process I need to write some basic information about each trained model into a…
Karl
  • 5,573
  • 8
  • 50
  • 73
6
votes
2 answers

h2o package: total cluster memory zero

data1.dl.r2 = vector() for (i in 1:100) { if (i==1) { data1.hex = as.h2o(data1) } else { data1.hex = nextdata } data1.dl = h2o.deeplearning …
6
votes
1 answer

H2O : NullPointerException error while building ensemble model using deep learning grid

I am trying to build a stacked ensemble model to predict merchant churn using R (version 3.3.3) and deep learning in h2o (version 3.10.5.1). The response variable is binary. At the moment I am trying run the code to build a stacked ensemble model…
delpat
  • 63
  • 3
6
votes
2 answers

Parallel processing in R with H2O

I am setting up a piece of code to parallel processes some computations for N groups in my data using foreach. I have a computation that involves a call to h2o.gbm. In my current, sequential set-up, I use up to about 70% of my RAM. How do I…
wake_wake
  • 1,332
  • 2
  • 19
  • 46
6
votes
1 answer

H2O in Kubernetes

Has anyone managed to run a H2O Cluster in Kubernetes? I tried 2 options both using flatfile 1) using StatefulSet, but since the ip generated for the pod can change the cluster is unreliable 2) using a bunch of pairs of service/deployments and…
6
votes
1 answer

what is the different between h2o.ensemble and h2o.stack in package h2oEnsemble

Accoding to the Description of function: h2o.stack: This function creates a "Super Learner" (stacking) ensemble using a list of existing H2O base models specified by the user. h2o.ensemble: This function creates a "Super Learner" (stacking) ensemble…
Tao Hu
  • 287
  • 2
  • 12
6
votes
1 answer

How to cast data from long to wide format in H2O?

I have data in a normalised, tidy "long" data structure I want to upload to H2O and if possible analyse on a single machine (or have a definitive finding that I need more hardware and software than currently available). The data is large but not…
Peter Ellis
  • 5,694
  • 30
  • 46
6
votes
3 answers

Print "pretty" tables for h2o models in R

There are multiple packages for R which help to print "pretty" tables (LaTeX/HTML/TEXT) from statistical models output AND to easily compare the results of alternative model specifications. Some of these packages are apsrtable, xtable, memisc,…
majom
  • 7,863
  • 7
  • 55
  • 88
6
votes
2 answers

Error in running h2o.ensemble

I am getting error while running h2o.ensemble in R. This is the error output [1] "Cross-validating and training base learner 1: h2o.glm.wrapper" |======================================================================| 100% [1] "Cross-validating…
saurabh agarwal
  • 2,124
  • 4
  • 24
  • 46
6
votes
2 answers

as.h2o() in R to upload files to h2o environment takes a long time

I am using h2o to carry out some modelling, and having tuned the model, i would now like it to be used to carry out a lot of predictions approx 6bln predictions/rows, per prediction row it needs 80 columns of data The dataset I have already broken…
h.l.m
  • 13,015
  • 22
  • 82
  • 169
5
votes
1 answer

Error when using random effect with h2o.glm in R

I would like to use h2o in R for glm regression but with random effects (HGLM, seems possible from this page ). I do not manage to make it work yet, and get errors I do not understand. Is here my working example: I define a dataset with Simpson…
denis
  • 5,580
  • 1
  • 13
  • 40
5
votes
1 answer

Negative SHAP values in H2O in Python using predict_contributions

I have been trying to compute SHAP values for a Gradient Boosting Classifier in H2O module in Python. Below there is the adapted example in the documentation for the predict_contibutions method (adapted from…
jessicalfr
  • 69
  • 6
5
votes
2 answers

Saving H2o data frame

I am working with 10GB training data frame. I use H2o library for faster computation. Each time I load the dataset, I should convert the data frame into H2o object which is taking so much time. Is there a way to store the converted H2o object ? (so…
Karanam Krishna
  • 365
  • 2
  • 16
5
votes
2 answers

How to best use zipcodes in Random Forest model training?

I have a dataset with zipcode column. They have some significance in output and I want to use it as a feature. I am using random forest model. I need a suggestions on best way to use zipcode column as a feature. (For example should I get lat/long…
5
votes
1 answer

Custom loss function in H2O

I am using H2O via R. I am trying to build random forest, XGBoost, GBM models to solve multiclass problem. The model performance insights that H2O provides are great but as one of the success criterias I have my own custom function that scores the…
sarang
  • 51
  • 3