2

I am using Livy to run the wordcount example by creating jar file which is working perfectly fine and writing output in HDFS. Now I want to get the result back to my HTML page. I am using Spark scala, sbt, HDFS and Livy.

The GET/batches REST API only shows log and state.

How do I get output results?

Or how can I read a file in HDFS using REST API in Livy? Please help me out with this.

Thanks in advance.

Divya Arya
  • 439
  • 5
  • 22

2 Answers2

0

If you check the status for the batches using curl you will get the status of Livy batch job which will come as Finished(If spark driver has launched successfully).

To read the output: 1. You can do SSH using paramiko to the machine where hdfs is running and run hdfs dfs -ls / to check the output and perform your desired tasks.

  1. Using the Livy rest API you need to write a script which does the step 1 and that script can be called through curl command to fetch the output from HDFS but in this case Livy will launch seperate spark driver and output will come in the STDOUT of the driver logs.

curl -vvv -u : :/batches -X POST --data '{"file": "http://"}' -H "Content-Type: application/json"

First one is the sure way of getting the output though I am not 100% sure about how second approach will behave.

Aman Khare
  • 161
  • 1
  • 5
0

You can use WebHDFS in you REST call.Get the WebHDFS enabled first by ur Admin.

  1. Use the webHDFS URL
  2. Create HttpURLConnection object
  3. Set Request method as GET

then use buffer reader to getInputStream.

Jérôme
  • 1,254
  • 2
  • 20
  • 25
viv_tony
  • 31
  • 1