I need to export data in csv format from my %sql interpreter in zeppelin. How can I do so? I need to add a button and on clicking on that it should export the data in csv as shown by the graphs in zeppelin in sql interpreter on the client side.
Asked
Active
Viewed 1.3k times
2 Answers
4
At the moment, this is not supported (Zeppelin 0.5.6). Still, it seems that this will be added in the next version (0.6.0). You can clone it from the Zeppelin git page, or you can use the next work-around that I'm using:
- You'll need the IDs of the notebook and paragraph which you try to export. You can get them by clicking on "Link this paragraph" in the options menu of the paragraph you want to export. When you do that, you'll get a new window. The IDs are in the url of the new window:
http://localhost:8080/#/notebook/{Notebook-ID}/paragraph/{Paragraph-ID}?asIframe
- Use the Zeppelin Notebook API. Send a HTTP-GET request to
http://localhost:8080/api/notebook/{Notebook-ID}/paragraph/{Paragraph-ID}
- The response is a json. The field
body.result.msg
is a string holding the result as TSV (Tab Separated Values). This is pretty much what you need (you can parse it and replace all\t
in the string with,
to get a CSV file).
A simple code can get you this worked-out in no time.
EDIT:
Here's a Python script which does exactly this. Call getTSV
and send it the url of the paragraph you get from clicking "Link this paragraph":
import requests
import json
def parseURL(paragraphUrl):
url = paragraphUrl.split(":8080")
address = url[0]
vals = url[1].split("/")
notebook = vals[3]
paragraph = vals[5].split("?")[0]
return [address, notebook, paragraph]
def getData(address, notebook, paragraph):
response = requests.get(address + ":8080/api/notebook/" + notebook + "/paragraph/" + paragraph)
return response.text
def getTSV(paragraphUrl):
# This function gets the same url that you get from clicking on "Link this paragraph"
[address, notebook, paragraph] = parseURL(paragraphUrl)
response = getData(address,notebook,paragraph)
return json.loads(response)["body"]["result"]["msg"]

shakedzy
- 2,853
- 5
- 32
- 62
-
2Pay attention Zeppelin displays only the first 1000 rows. – Thomas Decaux Jul 26 '17 at 08:39
-
Only in the SQL UI. This script gives all the data – shakedzy Aug 08 '17 at 07:33
-
Strange, in source I can read a such limitation (see https://github.com/apache/zeppelin/blob/master/jdbc/src/main/java/org/apache/zeppelin/jdbc/JDBCInterpreter.java#L540), isnit ? – Thomas Decaux Sep 13 '17 at 15:46
-
nothin yet, you can check the jira , maybe there is an issue about it. – Thomas Decaux Apr 30 '18 at 09:03
-
6 years later, they still haven't gotten around to it. I don't think it's going to happen. – Hack-R Aug 05 '22 at 20:46
1
Here is shakedzy's code ported to ruby, for anyone that may prefer it over python (like me):
require 'net/http'
require 'json'
def get_zeppelin_data(paragraph_url)
rx = %r{http://([\w\d]+):([\d]+)/?#/notebook/([\w\d]+)/paragraph/([\d\-_]+)}
address, port, notebook_id, paragraph_id = paragraph_url.match(rx).captures
api_url = "http://#{address}:#{port}/api/notebook/#{notebook_id}/paragraph/#{paragraph_id}"
result = Net::HTTP.get(URI(api_url))
data = JSON.parse(result)
data["body"]["result"]["msg"]
end

Zack
- 1,201
- 8
- 21