Questions tagged [sparkling-water]

Sparkling Water integrates H2O's fast scalable machine learning engine with Spark.

From Sparkling-water Github:

Sparkling Water integrates H2O's fast scalable machine learning engine with Spark. It provides:

Utilities to publish Spark data structures (RDDs, DataFrames) as H2O's frames and vice versa. DSL to use Spark data structures as input for H2O's algorithms Basic building blocks to create ML applications utilizing Spark and H2O APIs Python interface enabling use of Sparkling Water directly from pySpark

Getting Started

  • Select right version

The Sparkling Water is developed in multiple parallel branches. Each branch corresponds to a Spark major release ie for Spark 1.6 use branch sparkling version 1.6

Recommended reference sources:

Sparkling-water installation guide
Sparkling water documentation
Sparkling-water GitHub Documentation

129 questions
0
votes
1 answer

How to embed custom plots in the H2O flow UI in Scala or R?

After some investigation, I have found out that the sparkling water H2O flow UI has a very limited set of plots - just Box plots, and distributions, for data visualization in Scala. But if I want to use a third party library (need recommendations on…
user3243499
  • 2,953
  • 6
  • 33
  • 75
0
votes
1 answer

How to make plots for data exploration in sparkling waters h2oflow UI?

I am trying to explore a sample dataset by visualizing the data using plots like Box plots, scatter plots, histograms, etc. Unfortunately, I not able to find any commands even in the H2o documentation on how to print or show plots of data. Is there…
user3243499
  • 2,953
  • 6
  • 33
  • 75
0
votes
0 answers

Unable to start sparkling shell in Windows

I am running Windows 10 and using Scala 2.11 and spark version 2.2.1. Spark_home is also configured and path is also added to it's bin and I can start spark-shell directly from command prompt from any directory. But then when I run…
user3243499
  • 2,953
  • 6
  • 33
  • 75
0
votes
1 answer

not able to create H2OContext in Databricks- using pysparkling

I am not able to create H2OContext in Spark Databricks- using pysparkling. It is giving the following error. Code:from pysparkling import * Code:import h2o Code:h2oConf = H2OConf(spark) Code:h2oConf.set("spark.ui.enabled", True) Out[2]: Sparkling…
0
votes
1 answer

Configuring pysparkling logger with custom format

I'm trying to productionize Python Sparkling Water application and I want to unify logging formats from my app, Spark and H2O. I was able to modify log4j.properties in Spark home and achieve it with Spark logs, however, H2O logs doesn't have format…
omikron
  • 2,745
  • 1
  • 25
  • 34
0
votes
0 answers

Can't load a 2.3 TB file into sparkling-water cluster with 10 TB memory

Having the following issue with Sparkling-water version 2.2.9. My Hadoop cluster is running CDH 5.13. As per the H2o documentation, I should have roughly 4x the memory as the data size in the H2o/Sparkling-water cluster. I can import a 750 GB data…
VShankar
  • 151
  • 3
0
votes
2 answers

run multiple instances of sparkling water on the same cluster

Two concurrent h2ocontext created on the same driver seem to conflict with each other. When one is running, the other one will throw errors. Can we do some configuration such that two instances of sparkling water can run in parallel?
0
votes
1 answer

Force H2O Sparkling Water cluster to start on a specific machine in YARN mode

Tools used: Spark 2 Sparkling Water (H2O) Zeppeling notebook Pyspark Code I'm starting H2O in INTERNAL mode from my Zeppelin notebook, since my environment is YARN. I'm using the basic command: from pysparkling import * hc =…
orryk
  • 1
  • 2
0
votes
0 answers

java.lang.IllegalArgumentException: Operation not allowed on string vector

val airlinesDf = spark.read.csv("input file") val airlinesData : H2OFrame = airlinesDf val airlinesTable: RDD[Airlines] = asRDD[Airlines](airlinesDf) val flightsToORD = airlinesTable.filter(f => f.Dest == Some("ORD")) flightsToORD.count() when…
0
votes
1 answer

Get Distance of Point From Cluster Centroid on H2o KMEANS Clustering

In H2O KMeans Cluster. is there a way to calculate the actual distances from the cluster centroids for each point in the data set? Currently H2o Gives the predicted Cluster for the data passed but what the best way of getting the distance of a point…
0
votes
2 answers

Can't start H2O cluster for manual Sparkling Water backend

I'm trying to start a H2O cluster as external backend for Sparkling Water manually. By following the documentation here it says I need to use the parameter 'name' with the extended H2O driver. But by doing so it says that the parameter 'name'…
Markus Wilhelm
  • 171
  • 2
  • 2
  • 11
0
votes
0 answers

H2o Sparkling water and Duke Library

We started a POC of Sparkling water and realized they are internally using the duke library. Nor sure what features of Duke Library H2o. Duke allows to use custom comparators? Does h2o exposes this feature ? I have gone through the source code and…
Sateesh K
  • 1,071
  • 3
  • 19
  • 45
0
votes
0 answers

Configuring Eclipse with Sparkling Water

My objective is to use Eclipse for coding Sparkling Water on Mac. I have been able to install Java, Scala, Spark 2.2 with Brew install, Sparkling water as binaries and have been actively coding on Jupyter notebook. I have also been able to…
user1124702
  • 1,015
  • 4
  • 12
  • 22
0
votes
1 answer

Are the nodes in H20 Sparkling preemptible?

I am running Sparkling waterover 36 Spark executors. Due to Yarn's scheduling, some executors would preempt and comeback later. Overall, there are 36 executors for the majority of time, just not always. So far, my experience is that, as soon as 1…
axiom
  • 406
  • 1
  • 4
  • 16
0
votes
1 answer

Getting the exception in h2o when creating the context

I get below exception in my code when I try to create a h2o contetx by Spark 1.6.3 17/11/06 12:01:39 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[H2O Launcher thread,5,main] java.lang.NoSuchMethodError:…
Luckylukee
  • 575
  • 2
  • 9
  • 27
1 2 3
8 9