Questions tagged [sparkling-water]

Sparkling Water integrates H2O's fast scalable machine learning engine with Spark.

Sparkling Water integrates H2O's fast scalable machine learning engine with Spark. It provides:

Utilities to publish Spark data structures (RDDs, DataFrames) as H2O's frames and vice versa. DSL to use Spark data structures as input for H2O's algorithms Basic building blocks to create ML applications utilizing Spark and H2O APIs Python interface enabling use of Sparkling Water directly from pySpark

Getting Started

Select right version

The Sparkling Water is developed in multiple parallel branches. Each branch corresponds to a Spark major release ie for Spark 1.6 use branch sparkling version 1.6

Recommended reference sources:

Sparkling-water installation guide
Sparkling water documentation
Sparkling-water GitHub Documentation

129 questions

votes

1 answer

GBM training with Sparkling Water on EMR failing with increased data size

I’m trying to train a GBM on an EMR cluster with 60 c4.8xlarge nodes using Sparkling Water. The process runs successfully up to a specific data size. Once I hit a certain data size (number of training examples) the process freezes in the collect…

asked Feb 14 '17 at 18:24

Amir Ziai

votes

2 answers

Building a minimal Sparkling Water application

I am new to the sparkling water. I now how to run my program from sparkling-shell. However, I am not sure how to build a standalone application that I can give as an input to spark submit. What are the jars that I need to include to build my…

apache-spark h2o sparkling-water

asked Feb 03 '17 at 21:09

Souradeep Basu

votes

1 answer

worker to worker communication in Sparkling Water

I think it is implied from the system diagram that sparkling water implemented worker to worker direct communication (without going back to the master). Can someone point out where is the code is that feature?

apache-spark sparkling-water

asked Jul 25 '16 at 00:53

bhomass

3,414
8
45
75

votes

1 answer

Databricks + H2O PySparkling: addURL Py4JException

I am a newbie to H2O and spark framework and I am having troubles with on boarding H2O+Spark (sparkling-water) PySparkling in Databricks. I have a 12 worker cluster running in Databricks in 1.5.2 environment. Steps I took were as following: 1.…

python pyspark jupyter-notebook h2o sparkling-water

asked May 28 '16 at 23:28

ASG

votes

2 answers

H2o Number of Executors not working

I start the sparkling-shell with the following command. ./bin/sparkling-shell --num-executors 4 --executor-memory 4g --master yarn-client I only ever get two executors. Is this an H2o problem, YARN problem, or Spark problem? Mike

hadoop apache-spark h2o sparkling-water

asked May 13 '16 at 17:44

uh_big_mike_boi

3,350
4
33
64

votes

2 answers

How to force an H2OFrame column as of type Integer in scala?

I am training a DRFModel and while evaluating receiving an exception: Exception in thread "main" java.lang.ClassCastException: hex.ModelMetricsRegression cannot be cast to hex.ModelMetricsBinomial. The data has a column called "label" that contains…

scala h2o sparkling-water

asked May 06 '16 at 00:42

S.P.

votes

1 answer

Sparkling Water - run python script as a Spark Application

I have some trouble with Sparkling Water to run a python script as a Spark Application. I use this command to execute my script on Spark : ./bin/spark-submit \ --packages ai.h2o:sparkling-water-core_2.10:1.5.12 \ --py-files…

python pyspark h2o sparkling-water

asked Apr 12 '16 at 20:26

pierre_comalada

-1

votes

0 answers

Pysparkling issue with JDK 11

I have a process that performs anomaly detection using Isolation Forest (pysparkling) on a pyspark data frame. It performs multiple steps including... #initialising h2o hc = H2OContext.getOrCreate() h2o_df = hc.asH2OFrame(df) # #…

apache-spark pyspark java-11 h2o sparkling-water

asked Aug 09 '23 at 16:04

Yash Nahar

-2

votes

1 answer

When import file, it reads it as CSV and garbles the data

Running sparkling-shell (tried versions 2.2.2 - 2.2.6) on with Spark2 (under CDH 5.13 under Linux 7.2). CSV and ZIP files import fine, but when I tried to import a Parquet file, it reads it as CSV and garbles the data. Anyone has any…

parquet h2o sparkling-water

asked Jan 23 '18 at 20:59

VShankar

Prev 1 2 3

…