Questions tagged [sparkling-water]

Sparkling Water integrates H2O's fast scalable machine learning engine with Spark.

From Sparkling-water Github:

Sparkling Water integrates H2O's fast scalable machine learning engine with Spark. It provides:

Utilities to publish Spark data structures (RDDs, DataFrames) as H2O's frames and vice versa. DSL to use Spark data structures as input for H2O's algorithms Basic building blocks to create ML applications utilizing Spark and H2O APIs Python interface enabling use of Sparkling Water directly from pySpark

Getting Started

  • Select right version

The Sparkling Water is developed in multiple parallel branches. Each branch corresponds to a Spark major release ie for Spark 1.6 use branch sparkling version 1.6

Recommended reference sources:

Sparkling-water installation guide
Sparkling water documentation
Sparkling-water GitHub Documentation

129 questions
0
votes
1 answer

Is it possible to use MLFlow and H2o.ai sparkling water in a Scala based project?

I'm solving a Scala data science problem in Intellij using maven. I noticed that MLFlow spark (https://mvnrepository.com/artifact/org.mlflow/mlflow-spark/1.5.0) is dependent on scala 2.12 while h2o.ai sparkling water is dependent on scala 2.11…
proselotis
  • 11
  • 1
0
votes
0 answers

Predict function giving error for XGBoost but running for GBM in H2O

I am making a classification model using H2O in python. I am able to build a GBM model and make predictions on training and test dataset whereas when I build an XGBoost model and try to make predictions. Below is the GBM code: (Runs perfectly…
0
votes
1 answer

Disable H2O Flow UI

I'm planning to use H2O with sparkling water (vesion 3.26.0.2) in production. Thus it is reuired to disable Flow UI that is available via port 54321. I was unable to find a configuration property to achieve this. Is it possible or do I have to…
Gandras
  • 23
  • 3
0
votes
2 answers

Run coxph model for large data set with 300 columns( 6 GB ) in H2o sparkling water

We are trying to run coxph model using h2o,Rsparkling for large data set with 6 GB with 300 columns, whatever the configuration we take for spark, we are getting memory issues. As per h2o, we should only have 4 times data size bigger cluster, but…
Divya M
  • 43
  • 1
  • 8
0
votes
0 answers

NameError: name 'H2OContext' is not defined in Pycham

I'm trying to run sparkling-water from pycham, I have install the spark and sparkling-water-2.4.13.but getting error "NameError: name 'H2OContext' is not defined". The same code is working from cli. from pyspark.sql import SparkSession import…
0
votes
1 answer

Does H2O Sparkling Water allow for Online-Training with Kafka as Streaming Source

I'm currently experimenting with the possibilities of Sparkling-Water. There are a few possible Use-Cases including Data-Munging in H2O/Spark, Model Building and Offline-Training and Online Stream Prediction. I was wondering whether it is also…
dnks23
  • 359
  • 6
  • 22
0
votes
1 answer

Create H2O Sparkling-Water app in IntelliJ

I want to set up a Sparkling-Water app in IntelliJ. I found the droplet for a project at: Sparkling-Water-Droplet But this has not been touched for a year and I was wondering whether there is a more recent version or any other template with newer…
dnks23
  • 359
  • 6
  • 22
0
votes
1 answer

Frame upload/creation on H2O external backend hangs from python/pyspark

I'm experiencing an issue where the h2o.H2OFrame([1,2,3]) command is creating a frame within h2o on an internal backend, but not on an external backend. Instead, the connection is not terminating (the frame is being created but the process hangs).…
0_0
  • 564
  • 1
  • 9
  • 17
0
votes
2 answers

How change the column type numeric to enum in sparkling water using scala?

I have to change numeric columns to Enum type of h2o frame in sparkling water using Scala and how to print schema of h2o frame.
0
votes
1 answer

rsparkling as_h2o_frame does not work: java.lang.OutOfMemoryError: GC overhead limit exceeded

I first import a dataset from csv to Spark, do some transformation in Spark, and then try to convert it into H2O Frame. Here's my code: library(rsparkling) library(h2o) library(dplyr) library(sparklyr) sc <- spark_connect(master = "local") data <-…
Catiger3331
  • 611
  • 1
  • 6
  • 18
0
votes
1 answer

From h2o, is there a way to export N folder cross validation results into a dataframe?

I am using H2O sparking water to built GBM model. I know we can view the N folder cross validation results using code below: gbm_model.model_performance(xval = True) But is there a way to save each folder's model performance into a data frame? For…
Gavin
  • 1,411
  • 5
  • 18
  • 31
0
votes
1 answer

Create partial dependence plot using H2O in spark?

I am trying to create partial dependent plot using the following code rf_pdp = rf_model .partial_plot(data = htest, cols = ['var1', 'var2', 'var3'], plot=True) rf_pdp it runs without error and generate a table with mean_response, stddev_response,…
Gavin
  • 1,411
  • 5
  • 18
  • 31
0
votes
1 answer

How to make H2OGridSearch for H2OGradientBoostingEstimator repeatable (Reproducibility) in spark environment?

I am using the following code to run GBM in Sparkling Water. I have set up the seed and score_each_iteration, but every time, it still generates different results when I check the AUC even though I have set the seed and…
Gavin
  • 1,411
  • 5
  • 18
  • 31
0
votes
1 answer

h2o deeplearning error when specifying nfolds for cross validation

has this issue been resolved by now? I encounter the same error message. Usecase: I am doing binary classification using h2o's deeplearning() function. Below, I provide randomly generated data the same size as my actual usecase. System specs: # R…
KSA
  • 3
  • 3
0
votes
1 answer

Training H2O Stacked Ensemble Models using exported Mojo and Binary Models

I am trying to build stacked ensemble models using H2O Java APIs. For this, I trained 2 models A GBM Model A DRF Model I exported these models in both Mojo and Binary format. For exporting models, I used the following code snippet: For Mojo…
1 2 3
8 9