Questions tagged [sparkling-water]

Sparkling Water integrates H2O's fast scalable machine learning engine with Spark.

From Sparkling-water Github:

Sparkling Water integrates H2O's fast scalable machine learning engine with Spark. It provides:

Utilities to publish Spark data structures (RDDs, DataFrames) as H2O's frames and vice versa. DSL to use Spark data structures as input for H2O's algorithms Basic building blocks to create ML applications utilizing Spark and H2O APIs Python interface enabling use of Sparkling Water directly from pySpark

Getting Started

  • Select right version

The Sparkling Water is developed in multiple parallel branches. Each branch corresponds to a Spark major release ie for Spark 1.6 use branch sparkling version 1.6

Recommended reference sources:

Sparkling-water installation guide
Sparkling water documentation
Sparkling-water GitHub Documentation

129 questions
1
vote
0 answers

Running pysparkling-water using Livy spark failed

I have been able to run the ChicagoCrimeDemo.py script using spark-submit successfully (spark-submit --master=yarn-client --py-files /opt/sparkling-water-1.6.10/py/build/dist/h2o_pysparkling_1.6-1.6.10-py2.7.egg…
1
vote
1 answer

How to interpret results from Sparkling Water's GBM algorithm on classification task

I'm new to Sparkling Water and machine learning, I've built GBM model with two datasets divided manually into train and test. Task is classification with all numeric atributes (response column is converted to enum type). Code is in Scala. val…
velaciela
  • 11
  • 1
1
vote
0 answers

RSparkling: SqlException while accessign metastore_db of hive from RSparkling

I am running RSparkling on Local System with Apache Spark 2.0.1. When I set h2o_context(sc) I get permission exception for /tmp/hive which I set using winutils.exe. After that when I try to run the following command mtcars_tbl <- copy_to(sc,…
Mansoor
  • 1,157
  • 10
  • 29
1
vote
1 answer

Can I one only some columns that was used to create a GBM model and still Predict in Supervised Learning.?

In GBM Model - I have near to 150 columns used to train and create a model - I have a case where for some records I won't be getting all the columns. In that case will the model work - I don't want to set the values to 0 in that case.?
1
vote
2 answers

H2o Package not found Scala Sparkling Water

I am trying to run Sparkling Water on my Local instance of Spark 2.1.0. I followed documentation on H2o for Sparling Water. But when I try to execute sparkling-shell.cmd I am getting following error : The filename, directory name, or volume label…
Mansoor
  • 1,157
  • 10
  • 29
1
vote
3 answers

Spark Shell -The filename, directory name, or volume label syntax is incorrect

I am getting an error while running spark-shell.cmd with following paramters "C:\SoftwareLibraries\spark\spark-2.0.1\bin\spark-shell.cmd" --jars…
Mansoor
  • 1,157
  • 10
  • 29
1
vote
1 answer

sparklyr + rsparkling: Error while connecting to a cluster

For some time I'm using sparklyr package to connect to companys Hadoop cluster using the…
Maju116
  • 1,607
  • 1
  • 15
  • 30
1
vote
2 answers

Create a job that goes through H2O Flow automatically

I have created a flow to predict something with the distributed random forest model and now i want to predict every few days, without using the flow gui. So is there a way to automate your H2O Flow or to translate the entire script into java/python…
1
vote
1 answer

Understanding Sparkling Water

I am new to Sparkling Water, I want to ask some quick questions: Does Sparking Water support all the algorithms that both Spark MLlib and H2O provides Does Sparkling Water itself provide algorithms that Spark MLlib and H2O don't support? If I…
Tom
  • 5,848
  • 12
  • 44
  • 104
1
vote
0 answers

How to run Sparkling Water example with spark in local mode

I am trying to run sparkling water deep learning demo in IntelliJ IDEA The code link is: https://github.com/h2oai/sparkling-water/blob/RELEASE-2.0.3/examples/src/main/scala/org/apache/spark/examples/h2o/DeepLearningDemo.scala If fails to start, the…
Tom
  • 5,848
  • 12
  • 44
  • 104
1
vote
1 answer

Sparkling Water: out of memory when converting spark dataframe to H2o dataframe

I am trying to converting Spark DataFrame to H2O DataFrame For spark setup, I am using .setMaster("local[1]") .set("spark.driver.memory", "4g") .set("spark.executor.memory", "4g") and I tried H2O 2.0.2 and H2O 1.6.4. I got both the same error…
lserlohn
  • 5,878
  • 10
  • 34
  • 52
1
vote
2 answers

h2o sparkling water save frame to disk

I am trying to import a frame by creating a h2o frame from a spark parquet file. The File is 2GB has about 12M rows and Sparse Vectors with 12k cols. It is not that big in parquet format but the import takes forever. In h2o it is actually reported…
samst
  • 536
  • 7
  • 19
1
vote
1 answer

Unable to find class: org.apache.spark.h2o.package$StringHolder

I am trying the simple droplet https://github.com/h2oai/sparkling-water program, but I am unable to make it run successfully using spark-submit. I used sparkling water 1.6.4, as used in the sample code. spark-submit --jars…
lserlohn
  • 5,878
  • 10
  • 34
  • 52
1
vote
1 answer

the purpose of creating an h2o model

In the demo code https://github.com/h2oai/sparkling-water/blob/master/py/examples/notebooks/TensorFlowDeepLearning.ipynb I can more or less make out what the code is doing. My question is what is the advantage in creating the h2o model at the…
bhomass
  • 3,414
  • 8
  • 45
  • 75
1
vote
2 answers

Sparkling water: Can't make use of the support of spark ml pipelines

According to this blog by the Sparkling water guys, you are now able to use the Spark ML pipelines components to build a DL model in the latest versions. I tried adding the latest versions in my build.sbt "org.apache.spark" % "spark-mllib_2.10" %…
void
  • 2,403
  • 6
  • 28
  • 53
1 2 3
8 9