Questions tagged [sparkling-water]

Sparkling Water integrates H2O's fast scalable machine learning engine with Spark.

From Sparkling-water Github:

Sparkling Water integrates H2O's fast scalable machine learning engine with Spark. It provides:

Utilities to publish Spark data structures (RDDs, DataFrames) as H2O's frames and vice versa. DSL to use Spark data structures as input for H2O's algorithms Basic building blocks to create ML applications utilizing Spark and H2O APIs Python interface enabling use of Sparkling Water directly from pySpark

Getting Started

  • Select right version

The Sparkling Water is developed in multiple parallel branches. Each branch corresponds to a Spark major release ie for Spark 1.6 use branch sparkling version 1.6

Recommended reference sources:

Sparkling-water installation guide
Sparkling water documentation
Sparkling-water GitHub Documentation

129 questions
0
votes
1 answer

Weird tracebacks when shutting down Sparkling Water context

To reproduce, use simplest sparkling water Python example (https://github.com/h2oai/sparkling-water/blob/rel-2.2/py/examples/scripts/H2OContextInitDemo.py): from pysparkling import * from pyspark.sql import SparkSession import h2o # Initiate…
omikron
  • 2,745
  • 1
  • 25
  • 34
0
votes
1 answer

Continuous "Got IO error when sending batch UDP bytes: java.net.ConnectException: Connection refused" in RSparkling on CDH-5.10.2

I'm trying to execute this RSparkling example on an offline CDH-5.10.2 cluster. My environment is: Spark 1.6.0; sparklyr 0.6.2; h2o 3.10.5.2; rsparkling 0.2.1. I use custom Sparkling Water JAR which is basically 1.6.12 with this PR…
Igor Melnichenko
  • 134
  • 2
  • 13
0
votes
2 answers

Trouble getting latest Sparkling Water (2.2) to work with R (via rsparkling)

I'm having issues updating rsparkling to work with Sparkling Water 2.2 and Spark 2.2. Everything worked with previous versions (<2.1). I have installed the rsparkling version R package that comes with the latest Sparkling Water 2.2 binaries (as per…
renegademonkey
  • 457
  • 1
  • 7
  • 18
0
votes
1 answer

Model Serialization in H2O.ai Sparkling Water

Do you guys already worked with serialized models in Sparkling Models ou export models like the Spark to put in production? How can I do that?! Thanks in advance. Flavio
Flavio
  • 759
  • 1
  • 11
  • 24
0
votes
0 answers

Get "Provider org.apache.spark.h2o.RestAnnouncementProvider could not be instantiated" when creating H2OContext

I'm trying to start a H2O context in pyspark using H2OContext.getOrCreate(sc). With the python packages h2o=3.10.4.8 and h2o-pysparkling-1.6=1.6.8 this works as expected (packages installed using pip), however with h2o-pysparkling-1.6.11 I get the…
Heuvel
  • 1
  • 1
0
votes
1 answer

Error with H20Context running PySparkling with Spark 2.1

I'm getting this error when trying to run a Pysparkling script on an AWS EMR cluster. I can get everything to work when downloading Sparkling water 2.1.8 and running it from a pysparkling shell. However, spark-submit does not seem to work.…
Keston
  • 109
  • 1
  • 11
0
votes
1 answer

How to load and save models in Sparkling Water

I want to store a created model within sparkling water as a binary file so that I can can reload it with a different application. What is the best way?
Stefan Papp
  • 2,199
  • 1
  • 28
  • 54
0
votes
1 answer

H2O Mojo model from DRFModel

Having a trained DRFModel instance in scala, what's the best way of generating the corresponding MojoModel object for scoring? from the api s I've seen so far, mostly are around exporting to a file and then loading back up using the…
x89a10
  • 681
  • 1
  • 8
  • 23
0
votes
1 answer

h2o + r + flow Integration

I am trying to connect to Sparkling Water using R and also analyze my data frames on the H20 flow. I could connect to Spark instance from R using sparkly and sparklingR package and generate a few H20 dataframes. Please advise how can I access the…
0
votes
1 answer

Scores order guarantees while scoring a H2OFrame

While going over sparkling-water examples, a common pattern that is seen is for scoring and collection scores over a h2oframe is to do the following: val predictionH2OFrame = dlModel.score(result)('predict) val predictionsFromModel =…
x89a10
  • 681
  • 1
  • 8
  • 23
0
votes
1 answer

getOrCreate deployment failing randomly

When attempting to call H2OContext.getOrCreate with a valid SparkContext, randomly we keep seeing failures to deploy: 17/04/21 17:21:32 ERROR TaskSchedulerImpl: Lost executor 0 on 172.17.0.4: Remote RPC client disassociated. Likely due to containers…
deepelement
  • 2,457
  • 1
  • 25
  • 25
0
votes
1 answer

Error with hc=H2OContext.getOrCreate(sc) in pysparkling

I am new in Pysparkling. I work with yarn cluster, Spark 1.6, Cloudera CDH 5.8.0,python 2.7.6 and i have problem with hc=H2OContext.getOrCreate(sc). Do you have some ideas ? from pysparkling import * import h2o hc = H2OContext.getOrCreate(sc)…
0
votes
1 answer

error: value trainModel is not a member of hex.tree.gbm.GBM

When I try to add H2O with spark and use GBM model. I'm getting this exception while packaging it. This is my first time running H2O with spark. And I just tried adding the H2O libraries in my spark app and used the GBM within H2O.
0
votes
1 answer

Sparkling water working in yarn client mode but not in cluster mode

I am trying to submit my sparkling water application in yarn cluster mode but it fails. However, it runs in client mode. I am using the following to submit my jar: spark2-submit --class --conf…
0
votes
1 answer

Integrating Spark MLLib algorithm to H2O ai using Sparkling water

I am trying to integrate Collaborative algorithm in Spark MLLib with H2o Ai using Sparkling water for product recommendation. I followed this link http://spark.apache.org/docs/latest/mllib-collaborative-filtering.html and updated code as in…
mvg
  • 1,574
  • 4
  • 37
  • 63
1 2 3
8
9