Questions tagged [sparkling-water]

Sparkling Water integrates H2O's fast scalable machine learning engine with Spark.

From Sparkling-water Github:

Sparkling Water integrates H2O's fast scalable machine learning engine with Spark. It provides:

Utilities to publish Spark data structures (RDDs, DataFrames) as H2O's frames and vice versa. DSL to use Spark data structures as input for H2O's algorithms Basic building blocks to create ML applications utilizing Spark and H2O APIs Python interface enabling use of Sparkling Water directly from pySpark

Getting Started

  • Select right version

The Sparkling Water is developed in multiple parallel branches. Each branch corresponds to a Spark major release ie for Spark 1.6 use branch sparkling version 1.6

Recommended reference sources:

Sparkling-water installation guide
Sparkling water documentation
Sparkling-water GitHub Documentation

129 questions
1
vote
0 answers

Why h2o give different prediction over spark cluster from spark local?

H2O in spark cluster mode giving different predictions from spark local mode. H2O in spark local is giving better than spark cluster why it is happening ,can you help me? Tell me whether it's H2O behaviour. Two Data set are being used. One for…
poojanavin
  • 31
  • 4
1
vote
1 answer

How to export an h2o model as MOJO from sparkling water in scala, to be loaded by EasyPredictModelWrapper

My goal is to export an h2o model trained on spark with scala (using sparkling-water), such that I can import it in an application without Spark. Thus: using scala (the documentation only shows examples for r and python) export a model which is…
gerben
  • 692
  • 4
  • 16
1
vote
0 answers

NullPointerException PySparkling H2OFrame to Spark DataFrame

pysparkling 2.1 I run the following code: hc = H2OContext.getOrCreate(spark) h2o_frame = h2o.import_file('hdfs:path/to/my/file.csv') spark_frame = hc.as_spark_frame(h2o_frame) and it works just fine, just like in the documentation. But then when I…
Tiberiu
  • 990
  • 2
  • 18
  • 36
1
vote
1 answer

Is there any performance difference for ML Training between H2O Multi-node cluster and H2O Spark Cluster based on Sparkling Water?

I am curious about the cluster configuration environment in terms of the ML Training performance of H2O. If there are three nodes, is there a performance difference between configuring a generic H2O Multi-node Cluster and configuring an H2O Spark…
김태훈
  • 11
  • 1
1
vote
1 answer

Create Sparkling Water Cloud in Databricks using Python Notebook

I am trying to launch a Sparkling Water cloud within Spark using Databricks. I've attached the H2O library (3.16.0.2), PySparkling (pysparkling 0.4.6), and the Sparkling Water jar (sparkling-water-assembly_2.11-2.1.10-all.jar) to the cluster I'm…
Frank B.
  • 1,813
  • 5
  • 24
  • 44
1
vote
1 answer

Which the benefits of Sparking Water over H20 Machine learning Library

I've understood that Sparkling Water is H20 executed on a Spark environment and so it can use the Spark Engine (and all Spark distributed structures) to distribute computing, but in term of performances which are the benefits since H2O is already a…
xcsob
  • 63
  • 1
  • 9
1
vote
1 answer

Sparkling Water fails to create h2oContext in simple spark project

I am setting up for the first time Sparkling Water on a standalone cluster running spark 2.2. I have run Sparkling Water on such a cluster before via R (using rsparkling + sparklyr + h2o), but am having issues setting this up as a spark application…
renegademonkey
  • 457
  • 1
  • 7
  • 18
1
vote
0 answers

Cannot rename spark tables column names in sparklyr/rsparkling

Getting knee deep with sparklyr/rsparkling, I have some spark tables with annoying column names and I would like to rename them. But I cannot seem to do it. library(sparklyr) library(rsparkling) library(dplyr) library(DBI) sc <-…
Chris
  • 1,219
  • 2
  • 11
  • 21
1
vote
2 answers

LDAP authentication using sparkling-water

We need to authenticate user using LDAP in sparkling-water. We tried configuring the same using Sparkling-water 1.6.13 and h2O 3.14.0.2. Below is the configuration: *ldaploginmodule { org.eclipse.jetty.plus.jaas.spi.LdapLoginModule required …
1
vote
1 answer

How to change port of web UI with pysparkling

I'm just trying to get pysparkling working, but change the port of the web UI. I've looked in the help files and they seem to reference old versions of sparkling water. Currently am running from pysparkling import * hc =…
chib
  • 13
  • 2
1
vote
0 answers

Sparkling water local mode cluster error

I'm trying to extend the hamorspam example(https://github.com/h2oai/sparkling-water/blob/master/examples/scripts/hamOrSpam.script.scala ) to make parallel predictions for large dataset using spark's parallel computation power(during the inference…
siv
  • 31
  • 5
1
vote
1 answer

H2O error when calling as.factor on H2O data frame

When I call the following reproducible doce: install.packages("h2o", type = "source", repos = …
Levi Brackman
  • 325
  • 2
  • 17
1
vote
1 answer

Why does H2O integrate TensorFlow via Spark instead of directly?

I really like H2O especially because you can deploy the built models easily into any Java / JVM application... This is also my goal for TensorFlow: Build models and then run them in Java applications. H2O uses Spark (Sparking Water) "in the middle"…
Kai Wähner
  • 5,248
  • 4
  • 35
  • 33
1
vote
0 answers

Making H2O grid search deterministic

In order to run the h2o RandomDiscreteValueWalker[DRFParameters] with deterministic results, is it sufficient to set the seed on the DRFParameters and the RandomDiscreteValueSearchCriteria ? I get non-deterministic results even when I have the seed…
x89a10
  • 681
  • 1
  • 8
  • 23
1
vote
0 answers

Curl connection in H2O 3.11.4.8 using Apache Hadoop 2.7.3

I have installed HDP 2.6 in computer cluster with only 2 node. Each node has Processor 2 Core RAM 8 GB Harddisk 40 GB enter image description here I also installed Apache Hadoop 2.7.3, too. Because of that, i can run H2O 3.11.4.8 using YARN. But,…
Rendi 7936
  • 11
  • 3
1 2
3
8 9