Questions tagged [h2o]

Use this tag for questions about the H2O in-memory machine learning platform. Where relevant, add language tags like [r], [python], [scala], or [java].

Best Practices

Always post a Minimal, Complete and Verifiable Example (MCVE) and provide the H2O version number and client type (Python, R, Flow, etc).

If your question is not code related, do not post to Stack Overflow (per Stack Overflow guidelines). If your question is algorithm related, post to Cross-Validated on Stack Exchange using the "h2o" tag. All other questions can be posted to the h2ostream Google group (please do not double-post).

Resources

1875 questions
7
votes
1 answer

Specifying a url to a .whl file in a conda env .yml file

I want a specific older version of a package (h2o) to be installed when I load a conda env .yml file. However, the older versions for this package only seem to work if I install them using pip directly from a the url hosting the .whl file. For…
Dan
  • 45,079
  • 17
  • 88
  • 157
7
votes
2 answers

h2o warning message too old cluster

Hi i'm using h2o in R. Just a couple of weeks ago i update h2o package to the latest version h2o.getVersion() [1] "3.20.0.2" But when i Initialize a new h2o session with h2o.init i recieve a warning message like that In h2o.clusterInfo() : Your…
Marco Fumagalli
  • 2,307
  • 3
  • 23
  • 41
7
votes
1 answer

How to save All models from h2o automl

I'm trying to save all the models from an h2o.automl as part of the h2o package. Currently I am able to save a single model using h2o.saveModel(aml@leader, path = "/home/data/user"). How can I save all the models? Here is my attempt on a sample…
Ryan John
  • 1,410
  • 1
  • 15
  • 23
7
votes
1 answer

How to get data into h2o fast

What my question isnt: Efficient way to maintain a h2o data frame H2O running slower than data.table R Loading data bigger than the memory size in h2o Hardware/Space: 32 Xeon threads w/ ~256 GB Ram ~65 GB of data to upload. (about 5.6…
EngrStudent
  • 1,924
  • 31
  • 46
7
votes
3 answers

H2OFrame() in Python is adding additional duplicate rows to the Pandas DataFrame- Bug?

When converting a Pandas dataframe to a H2O frame using the h2o.H2OFrame() function an error is occurring. Additional rows are being created in the H2o Frame. When I looked into this, it appears the new rows are duplicates of other rows. Depending…
George
  • 674
  • 2
  • 7
  • 19
7
votes
1 answer

Python H2O Memory Management

Similar to this question in R here, I get out of memory issues when running loops with grid search in H2O. In R, doing gc() during each loop did help. What is the proposed solution here?
user90772
  • 387
  • 1
  • 5
  • 12
7
votes
0 answers

What are the constraints of FUN in H2O's apply function?

I am using h2o version 3.10.4.8. library(h2o) h2o.init(nthreads = -1) df <- as.h2o(data.frame(x = 1:5, y = 11:15)) I'm trying to understand how to use the apply() function in H2O. The following works as expected: h2o::apply(df, 2,…
mauna
  • 1,098
  • 13
  • 25
7
votes
2 answers

Can I use autoencoder for clustering?

In the below code, they use autoencoder as supervised clustering or classification because they have data labels. http://amunategui.github.io/anomaly-detection-h2o/ But, can I use autoencoder to cluster data if I did not have its labels.? Regards
forever
  • 139
  • 1
  • 2
  • 8
7
votes
2 answers

How to drop rows in an H2OFrame?

I've worked in the h2o R package for quite a while, now, but have recently had to move to the python package. For the most part, an H2OFrame is designed to work like a pandas DataFrame object. However, there are several hurdles I haven't managed to…
TayTay
  • 6,882
  • 4
  • 44
  • 65
7
votes
1 answer

How to get sparse matrices into H2O?

I am trying to get a sparse matrix into H2O and I was wondering whether that was possible. Suppose we have the following: test <- Matrix(c(1,0,0,1,1,1,1,0,1), nrow = 3, sparse = TRUE) and assuming my local H2O is localH2O, I can't seem to do the…
Snowflake
  • 2,869
  • 3
  • 22
  • 44
7
votes
2 answers

Why connection is terminating

I'm trying a random forest classification model by using H2O library inside R on a training set having 70 million rows and 25 numeric features.The total file size is 5.6 GB. The validation file's size is 1 GB. I have 16 GB RAM and 8 core CPU on my…
rks
  • 213
  • 5
  • 13
7
votes
4 answers

R H2O - Memory management

I'm trying to use H2O via R to build multiple models using subsets of one large-ish data set (~ 10GB). The data is one years worth of data and I'm trying to build 51 models (ie train on week 1, predict on week 2, etc.) with each week being about…
screechOwl
  • 27,310
  • 61
  • 158
  • 267
6
votes
2 answers

How to interpret the probabilities (p0, p1) of the result of h2o.predict()

I would like to understand the meaning of the value (result) of h2o.predict() function from H2o R-package. I realized that in some cases when the predict column is 1, the p1 column has a lower value than the column p0. My interpretation of p0 and p1…
David Leal
  • 6,373
  • 4
  • 29
  • 56
6
votes
2 answers

How to fetch details of non-leader models generated by h2o automl?

after running automl (classification of 3 classes), I can see a list of models as follows: model_id mean_per_class_error StackedEnsemble_BestOfFamily_0_AutoML_20180420_174925 …
slowD
  • 339
  • 2
  • 13
6
votes
1 answer

h2o.ai Platt Scaling calibration

I noticed a relatively recend add to the h2o.ai suite, the ability to perform supplementary Platt Scaling to improve the calibration of output probabilities. (See calibrate_model in h2o manual.) Nevertheless few guidance is avaiable on the online…
Giorgio Spedicato
  • 2,413
  • 3
  • 31
  • 45