Questions tagged [sparkling-water]

Sparkling Water integrates H2O's fast scalable machine learning engine with Spark.

From Sparkling-water Github:

Sparkling Water integrates H2O's fast scalable machine learning engine with Spark. It provides:

Utilities to publish Spark data structures (RDDs, DataFrames) as H2O's frames and vice versa. DSL to use Spark data structures as input for H2O's algorithms Basic building blocks to create ML applications utilizing Spark and H2O APIs Python interface enabling use of Sparkling Water directly from pySpark

Getting Started

  • Select right version

The Sparkling Water is developed in multiple parallel branches. Each branch corresponds to a Spark major release ie for Spark 1.6 use branch sparkling version 1.6

Recommended reference sources:

Sparkling-water installation guide
Sparkling water documentation
Sparkling-water GitHub Documentation

129 questions
0
votes
1 answer

GBM training with Sparkling Water on EMR failing with increased data size

I’m trying to train a GBM on an EMR cluster with 60 c4.8xlarge nodes using Sparkling Water. The process runs successfully up to a specific data size. Once I hit a certain data size (number of training examples) the process freezes in the collect…
Amir Ziai
  • 148
  • 1
  • 6
0
votes
2 answers

Building a minimal Sparkling Water application

I am new to the sparkling water. I now how to run my program from sparkling-shell. However, I am not sure how to build a standalone application that I can give as an input to spark submit. What are the jars that I need to include to build my…
0
votes
1 answer

worker to worker communication in Sparkling Water

I think it is implied from the system diagram that sparkling water implemented worker to worker direct communication (without going back to the master). Can someone point out where is the code is that feature?
bhomass
  • 3,414
  • 8
  • 45
  • 75
0
votes
1 answer

Databricks + H2O PySparkling: addURL Py4JException

I am a newbie to H2O and spark framework and I am having troubles with on boarding H2O+Spark (sparkling-water) PySparkling in Databricks. I have a 12 worker cluster running in Databricks in 1.5.2 environment. Steps I took were as following: 1.…
ASG
  • 15
  • 2
0
votes
2 answers

H2o Number of Executors not working

I start the sparkling-shell with the following command. ./bin/sparkling-shell --num-executors 4 --executor-memory 4g --master yarn-client I only ever get two executors. Is this an H2o problem, YARN problem, or Spark problem? Mike
uh_big_mike_boi
  • 3,350
  • 4
  • 33
  • 64
0
votes
2 answers

How to force an H2OFrame column as of type Integer in scala?

I am training a DRFModel and while evaluating receiving an exception: Exception in thread "main" java.lang.ClassCastException: hex.ModelMetricsRegression cannot be cast to hex.ModelMetricsBinomial. The data has a column called "label" that contains…
S.P.
  • 41
  • 5
0
votes
1 answer

Sparkling Water - run python script as a Spark Application

I have some trouble with Sparkling Water to run a python script as a Spark Application. I use this command to execute my script on Spark : ./bin/spark-submit \ --packages ai.h2o:sparkling-water-core_2.10:1.5.12 \ --py-files…
pierre_comalada
  • 300
  • 3
  • 11
-1
votes
0 answers

Pysparkling issue with JDK 11

I have a process that performs anomaly detection using Isolation Forest (pysparkling) on a pyspark data frame. It performs multiple steps including... #initialising h2o hc = H2OContext.getOrCreate() h2o_df = hc.asH2OFrame(df) # #…
-2
votes
1 answer

When import file, it reads it as CSV and garbles the data

Running sparkling-shell (tried versions 2.2.2 - 2.2.6) on with Spark2 (under CDH 5.13 under Linux 7.2). CSV and ZIP files import fine, but when I tried to import a Parquet file, it reads it as CSV and garbles the data. Anyone has any…
VShankar
  • 151
  • 3
1 2 3
8
9