Questions tagged [apache-zeppelin]

Apache Zeppelin is a web-based notebook that enables data-driven interactive data analytics. You can make beautiful data-driven, interactive and collaborative documents with SQL, Python, Scala and more. It also supports Markdown syntax.

Apache Zeppelin home page

1460 questions
11
votes
1 answer

How to connect Zeppelin to Spark 1.5 built from the sources?

I pulled the latest source from the Spark repository and built locally. It works great from an interactive shell like spark-shell or spark-sql. Now I want to connect Zeppelin to my Spark 1.5, according to this install manual. I published the custom…
Wanchun
  • 165
  • 1
  • 9
10
votes
1 answer

Spark throws java.util.NoSuchElementException: key not found: 67

Running the Spark bisecting kmmeans algorithm in Zeppelin. //I transform my data using the TF-IDF algorithm val idf = new IDF(minFreq).fit(data) val hashIDF_features = idf.transform(dbTF) //and parse the transformed data to the clustering…
Mnemosyne
  • 1,162
  • 4
  • 13
  • 45
10
votes
4 answers

Spark 1.6: filtering DataFrames generated by describe()

The problem arises when I call describe function on a DataFrame: val statsDF = myDataFrame.describe() Calling describe function yields the following output: statsDF: org.apache.spark.sql.DataFrame = [summary: string, count: string] I can show…
Rami
  • 8,044
  • 18
  • 66
  • 108
9
votes
3 answers

Apache Zeppelin 0.7.3 - http error 503 in browser

Following the minimalist installation instructions from here, then on macOS High Sierra 10.13.1 executing: bin/zeppelin-daemon.sh start The daemon starts OK, but pointing any browser to http://localhost:8080 yields HTTP ERROR: 503 Problem…
jtlz2
  • 7,700
  • 9
  • 64
  • 114
9
votes
1 answer

Apache Zeppelin - How to use Helium framework in Apache Zeppelin

From Zeppelin-0.7, Zeppelin started supporting Helium plugins/packages using Helium Framework. However, I am not able to view any of the plugin on Helium page (localhost:8080/#/helium). As per this JIRA, I placed sample Helium.json (available on s3)…
Nikhil Bhide
  • 728
  • 8
  • 23
9
votes
4 answers

How to add a jar in zeppelin?

How to add a jar in Zeppelin for %hive interpreter? I have tried %z.dep(''); add jar Also zeppelin hive interpreter throws ClassNotFoundException Adding to ./interpreter/hive/ throughs thrift exception while add jar says file not…
user 923227
  • 2,528
  • 4
  • 27
  • 46
9
votes
1 answer

Is data returned in %jdbc paragraph available in subsequent paragraphs?

If a paragraph returns data from the %jdbc intepreter, is that data available to following paragraphs that use other interpreters? eg %jdbc(psql) select * from `table` then %python # load / access data here x = ... In the same way that a…
9
votes
2 answers

export data in csv using zeppelin

I need to export data in csv format from my %sql interpreter in zeppelin. How can I do so? I need to add a button and on clicking on that it should export the data in csv as shown by the graphs in zeppelin in sql interpreter on the client side.
Nipun
  • 4,119
  • 5
  • 47
  • 83
9
votes
1 answer

Zeppelin Notebook Storage in local Git repository

I have followed the instructors for setting up Zeppelin Notebook Storage in local Git repository here: https://zeppelin.incubator.apache.org/docs/0.6.0-incubating-SNAPSHOT/storage/storage.html#Git But i am still unclear about how i can store…
Eoin Lane
  • 641
  • 2
  • 6
  • 22
9
votes
2 answers

Apache - Zeppelin using variables across paragraphs

I am trying to accomplish the following use case on Apache Zeppelin: When I write an sql query, for example %sql SELECT * FROM table1 WHERE column1 = ${column1=1,1|2|3|4} I get a combo box displayed with these values (1,2,3,4) as options. What I…
kunalc92
  • 91
  • 1
  • 5
8
votes
1 answer

How to set spark.driver.memory for Spark/Zeppelin on EMR

When using EMR (with Spark, Zeppelin), changing spark.driver.memory in Zeppelin Spark interpreter settings won't work. I wonder what is the best and quickest way to set Spark driver memory when using EMR web interface (not aws CLI) to create…
Rami
  • 8,044
  • 18
  • 66
  • 108
8
votes
1 answer

How can I select a stable subset of rows from a Spark DataFrame?

I've loaded a file into a DataFrame in Zeppelin notebooks like this: val df = spark.read.format("com.databricks.spark.csv").load("some_file").toDF("c1", "c2", "c3") This DataFrame has >10 million rows, and I would like to start work with just a…
Karmen
  • 367
  • 1
  • 3
  • 9
8
votes
2 answers

Reading Avro File in Spark

I have read an avro file into spark RDD and need to conver that into a sql dataframe. how do I do that. This is what I did so far. import org.apache.avro.generic.GenericRecord import org.apache.avro.mapred.{AvroInputFormat, AvroWrapper} import…
Gayatri
  • 2,197
  • 4
  • 23
  • 35
8
votes
2 answers

Configure Zeppelin's Spark Interpreter on EMR when starting a cluster

I am creating clusters on EMR and configure Zeppelin to read the notebooks from S3. To do that I am using a json object that looks like that: [ { "Classification": "zeppelin-env", "Properties": { }, "Configurations": [ { …
Rami
  • 8,044
  • 18
  • 66
  • 108
8
votes
2 answers

Permission denied: user=zeppelin while using %spark.pyspark interpreter in AWS EMR cluster

I have created pyspark structured streaming program and trying to execute in the Zeppelin notebook: %spark.pyspark query_window = windowedCounts \ .writeStream \ .outputMode("complete") \ …
1 2
3
97 98