Apache Zeppelin is a web-based notebook that enables data-driven interactive data analytics. You can make beautiful data-driven, interactive and collaborative documents with SQL, Python, Scala and more. It also supports Markdown syntax.
Questions tagged [apache-zeppelin]
1460 questions
11
votes
1 answer
How to connect Zeppelin to Spark 1.5 built from the sources?
I pulled the latest source from the Spark repository and built locally. It works great from an interactive shell like spark-shell or spark-sql.
Now I want to connect Zeppelin to my Spark 1.5, according to this install manual. I published the custom…

Wanchun
- 165
- 1
- 9
10
votes
1 answer
Spark throws java.util.NoSuchElementException: key not found: 67
Running the Spark bisecting kmmeans algorithm in Zeppelin.
//I transform my data using the TF-IDF algorithm
val idf = new IDF(minFreq).fit(data)
val hashIDF_features = idf.transform(dbTF)
//and parse the transformed data to the clustering…

Mnemosyne
- 1,162
- 4
- 13
- 45
10
votes
4 answers
Spark 1.6: filtering DataFrames generated by describe()
The problem arises when I call describe function on a DataFrame:
val statsDF = myDataFrame.describe()
Calling describe function yields the following output:
statsDF: org.apache.spark.sql.DataFrame = [summary: string, count: string]
I can show…

Rami
- 8,044
- 18
- 66
- 108
9
votes
3 answers
Apache Zeppelin 0.7.3 - http error 503 in browser
Following the minimalist installation instructions from here, then on macOS High Sierra 10.13.1 executing:
bin/zeppelin-daemon.sh start
The daemon starts OK, but pointing any browser to http://localhost:8080 yields
HTTP ERROR: 503
Problem…

jtlz2
- 7,700
- 9
- 64
- 114
9
votes
1 answer
Apache Zeppelin - How to use Helium framework in Apache Zeppelin
From Zeppelin-0.7, Zeppelin started supporting Helium plugins/packages using Helium Framework. However, I am not able to view any of the plugin on Helium page (localhost:8080/#/helium). As per this JIRA, I placed sample Helium.json (available on s3)…

Nikhil Bhide
- 728
- 8
- 23
9
votes
4 answers
How to add a jar in zeppelin?
How to add a jar in Zeppelin for %hive interpreter?
I have tried
%z.dep('');
add jar
Also zeppelin hive interpreter throws ClassNotFoundException
Adding to ./interpreter/hive/ throughs thrift exception while add jar says file not…

user 923227
- 2,528
- 4
- 27
- 46
9
votes
1 answer
Is data returned in %jdbc paragraph available in subsequent paragraphs?
If a paragraph returns data from the %jdbc intepreter, is that data available to following paragraphs that use other interpreters?
eg
%jdbc(psql)
select * from `table`
then
%python
# load / access data here
x = ...
In the same way that a…

oracle certified professional
- 984
- 16
- 26
9
votes
2 answers
export data in csv using zeppelin
I need to export data in csv format from my %sql interpreter in zeppelin. How can I do so?
I need to add a button and on clicking on that it should export the data in csv as shown by the graphs in zeppelin in sql interpreter on the client side.

Nipun
- 4,119
- 5
- 47
- 83
9
votes
1 answer
Zeppelin Notebook Storage in local Git repository
I have followed the instructors for setting up Zeppelin Notebook Storage in local Git repository here:
https://zeppelin.incubator.apache.org/docs/0.6.0-incubating-SNAPSHOT/storage/storage.html#Git
But i am still unclear about how i can store…

Eoin Lane
- 641
- 2
- 6
- 22
9
votes
2 answers
Apache - Zeppelin using variables across paragraphs
I am trying to accomplish the following use case on Apache Zeppelin:
When I write an sql query, for example
%sql SELECT * FROM table1 WHERE column1 = ${column1=1,1|2|3|4}
I get a combo box displayed with these values (1,2,3,4) as options.
What I…

kunalc92
- 91
- 1
- 5
8
votes
1 answer
How to set spark.driver.memory for Spark/Zeppelin on EMR
When using EMR (with Spark, Zeppelin), changing spark.driver.memory in Zeppelin Spark interpreter settings won't work.
I wonder what is the best and quickest way to set Spark driver memory when using EMR web interface (not aws CLI) to create…

Rami
- 8,044
- 18
- 66
- 108
8
votes
1 answer
How can I select a stable subset of rows from a Spark DataFrame?
I've loaded a file into a DataFrame in Zeppelin notebooks like this:
val df = spark.read.format("com.databricks.spark.csv").load("some_file").toDF("c1", "c2", "c3")
This DataFrame has >10 million rows, and I would like to start work with just a…

Karmen
- 367
- 1
- 3
- 9
8
votes
2 answers
Reading Avro File in Spark
I have read an avro file into spark RDD and need to conver that into a sql dataframe. how do I do that.
This is what I did so far.
import org.apache.avro.generic.GenericRecord
import org.apache.avro.mapred.{AvroInputFormat, AvroWrapper}
import…

Gayatri
- 2,197
- 4
- 23
- 35
8
votes
2 answers
Configure Zeppelin's Spark Interpreter on EMR when starting a cluster
I am creating clusters on EMR and configure Zeppelin to read the notebooks from S3. To do that I am using a json object that looks like that:
[
{
"Classification": "zeppelin-env",
"Properties": {
},
"Configurations": [
{
…

Rami
- 8,044
- 18
- 66
- 108
8
votes
2 answers
Permission denied: user=zeppelin while using %spark.pyspark interpreter in AWS EMR cluster
I have created pyspark structured streaming program and trying to execute in the Zeppelin notebook:
%spark.pyspark
query_window = windowedCounts \
.writeStream \
.outputMode("complete") \
…

Pari Margu
- 209
- 3
- 10