Questions tagged [sparkling-water]

Sparkling Water integrates H2O's fast scalable machine learning engine with Spark.

From Sparkling-water Github:

Sparkling Water integrates H2O's fast scalable machine learning engine with Spark. It provides:

Utilities to publish Spark data structures (RDDs, DataFrames) as H2O's frames and vice versa. DSL to use Spark data structures as input for H2O's algorithms Basic building blocks to create ML applications utilizing Spark and H2O APIs Python interface enabling use of Sparkling Water directly from pySpark

Getting Started

  • Select right version

The Sparkling Water is developed in multiple parallel branches. Each branch corresponds to a Spark major release ie for Spark 1.6 use branch sparkling version 1.6

Recommended reference sources:

Sparkling-water installation guide
Sparkling water documentation
Sparkling-water GitHub Documentation

129 questions
2
votes
1 answer

Represent a list of items in input CSV for H2O

How do I represent a set/list of items in the input data (data frame) for H2O? I'm using sparkling water 1.6.5 with H2O Flow. My input data (columns in the CSV file) look like this: age: numeric gender: enum hobbies: ? sports: ? hobbies and sports…
Markus Kramer
  • 411
  • 5
  • 13
2
votes
1 answer

How to filter rows in H2OFrame (scala) based on a column value?

I am reading an H2OFrame from a CSV file: val h2oFrame = new H2OFrame(new File(inputCsvFilePath)) How can I perform an equivalent of a .filter() operation (as available for Spark DataFrame or RDD). For example, how do I get a new H2OFrame where…
S.P.
  • 41
  • 5
1
vote
1 answer

Import POJO model with Sparkling Water (Scala)

I am trying to import a POJO model into Sparkling Water. I am currently importing the model by compiling it using: javac -cp /opt/bitnami/commons/pojo.jar -J-Xmx2g -J-XX:MaxPermSize=256m…
1
vote
2 answers

Are H2O.ai products affected by log4shell vulnerability?

My question is if Open Source H2O-3, Open Source Sparkling Water and Driverless AI are affected by CVE-2021-44228 and CVE-2021-45046.
Michal
  • 437
  • 3
  • 8
1
vote
0 answers

ERROR Instrumentation: java.lang.ClassNotFoundException: ai.h2o.sparkling.ml.models.H2OGBMMOJOModel

I am working on pysparkling in Databricks. I have built a model with pyspark transformers and h2o pysparkling algorithm. When I log the model on to mlflow and deploy in from a job cluster, I get the following error in the job cluster logs. I have…
1
vote
1 answer

Error in creating H2OContext in databricks using pysparkling

I am using spark version 2.4.4 and h2o-pysparkling-2.4 on the databricks and running following code h2oConf = H2OConf().set('spark.sql.autoBroadcastJoinThreshold', '-1') hc = H2OContext.getOrCreate(conf=h2oConf) Sometimes it is working well but…
1
vote
1 answer

Error when importing Sparkling Water (H2O) pipeline in Apache Spark: py4j.protocol.Py4JError

I recently created a PySpark pipeline using Sparkling Water's AutoML in the last stage (very similar to https://github.com/h2oai/sparkling-water/blob/master/py/examples/pipelines/ham_or_spam_multi_algo.py), but when I load my model from a file I get…
1
vote
1 answer

Get model metrics from MOJO model in H2O

I have a MOJO model that I want to explore for model metrics (rmse,roc, etc.) I understand all model metrics are available for a binary model, but I want to get these metrics from a MOJO model. Input - Mojo Model and training dataset Output -…
curios
  • 11
  • 2
1
vote
1 answer

How do I exclude algorithms from H2O AutoML in Sparkling Water by using Scala

I have to exclude some algorithms from AutoMl model. I am trying this to exclude algorithms but it fails. buildSpecHopper_1.build_models.exclude_algos = Array(Algo.DeepLearning,Algo.GLM) But it throws Class cast…
1
vote
1 answer

H2O sparkling water - DNN mini_batch_size parameter

I'm currently running Spark 2.3.0 with sparkling-water 2.3.1. I found the documentation of the underlying H2O library by looking at the changelog that links to this. So apparently it uses H2O 3.18. By looking at the DNN I noticed the lack of a…
1
vote
0 answers

pysparkling H2OConf interfering with my application log

Here is my code: from pysparkling import H2OConf #commenting this line makes it work import logging logging.basicConfig(filename='my_log.log',level=logging.INFO) logging.info('test') I cannot get the log file to get created, unless I comment the…
Tiberiu
  • 990
  • 2
  • 18
  • 36
1
vote
1 answer

How can I embed H2o in a Java application?

I am trying start embedded H2o in a Java application and train a model. However I don't get what exactly explained in the documentation (http://docs.h2o.ai/h2o/latest-stable/h2o-docs/faq/java.html). Can anyone help me by providing an…
Esildor
  • 169
  • 3
  • 10
1
vote
0 answers

Productionizing Spark Pipeline

Using Sparkling Water/H2o v. 2.3 for prediction. I am trying to export a spark pipeline model containing H2o model. Scoring will need to be done on a java based platform. Please suggest best method for the same. I have been exploring on it but could…
1
vote
2 answers

Sparkling water won't start on Spark on Google DataProc

I'm trying to use H2O Sparkling Water on Google DataProc. I've successfully run Sparkling Water on a standalone Spark, and now moved on to use it on DataProc. Initially, I got an error about spark.dynamicAllocation.enabled not being supported, so…
1
vote
1 answer

Not able to convert a spark Dataset to H2OFrame from asH2OFrame if the dataset is streaming dataset

I already have a Deep Learning model.I am trying to run scoring on streaming data. For this I am reading data from kafka using spark structured streaming api.When I try to convert the received dataset to H20Frame I am getting below error: Exception…
1
2
3
8 9