7
  1. How do we run the notebook from command line?

  2. Further to 1, how would I pass command line arguments into the notebook? I.e. access the command line args from within the notebook code?

samthebest
  • 30,803
  • 25
  • 102
  • 142
thousif ahmed
  • 71
  • 1
  • 3

2 Answers2

7

So I had the same issue and managed to work out how to use the API to run a notebook using curl. As for passing in command line arguments think there is simply no way to do that - you will have to use some sort of shared state on the server (e.g. have the notebook read from a file, and modify the file).

Anyway this is how I managed to run a notebook, it assumes jq is installed. Pretty involved :(

curl -XGET http://${ip}:8080/api/interpreter/setting | jq '.body[] | .id'

interpreter_settings_ids=`curl -XGET http://${ip}:8080/api/interpreter/setting | jq '.body[] | .id'`

id_array="["`echo ${interpreter_settings_ids} | tr ' ' ','`"]"

curl -XPUT -d $id_array http://${ip}:8080/api/notebook/interpreter/bind/${notebook_id}

curl -XPOST http://${ip}:8080/api/notebook/job/${notebook_id}

If someone has manually clicked the "save" button for the interpreter binding then only the last command is required.

UPDATE:

OK I think you can loop to probe the status of the running notebook to determine if the notebook failed, see: https://github.com/eBay/Zeppelin/blob/master/docs/rest-api/rest-notebook.md

For example

function job_success {
    num_cells=`curl -XGET http://${ip}:8080/api/notebook/job/${notebook_id} 2>/dev/null | jq '.body[] | .status' | wc -l`
    num_successes=`curl -XGET http://${ip}:8080/api/notebook/job/${notebook_id} 2>/dev/null | jq '.body[] | .status' | grep FINISHED | wc -l`
    test ${num_cells} = ${num_successes}
}

function job_fail {
    curl -XGET http://${ip}:8080/api/notebook/job/${notebook_id} 2>/dev/null | jq '.body[] | .status' | grep ERROR
}

until job_success || job_fail
do
    sleep 10
done
samthebest
  • 30,803
  • 25
  • 102
  • 142
  • 2
    Did you just write an answer then re-write the question to match your answer? – Lightness Races in Orbit May 17 '16 at 17:53
  • @LightnessRacesinOrbit No, you can just look at the history to see that the question remains semantically the same but easier to google for. I'm actually hoping someone else comes up with a better answer. – samthebest May 18 '16 at 13:32
  • @LightnessRacesinOrbit The semantics is identical, I express the point in less words with greater elegance - it is objectively better and substantively equivalent. However, I may go beyond the "spirit" of the post, but I have no idea what that means. Give me a definition of "spirit". – samthebest May 18 '16 at 14:42
  • `id_array=\`curl -XGET http://${ip}:8080/api/interpreter/setting | jq '.body[].id' | jq -scr '.'\`` is better than `tr` magic with shell text concatenation. We want a JSON array, then we should make `jq` give us a JSON array – user1129682 Sep 20 '18 at 14:06
  • actually `id_array=\`curl -XGET http://${ip}:8080/api/interpreter/setting | jq -cr '[ .body[].id ]'\`` is even shorter – user1129682 Sep 20 '18 at 14:14
  • How can i get the interpreter id for python or spark. The first curl returns all the IDs without knowing what it relates to – mRyan Jan 27 '21 at 12:36
2

As of version 0.7.3 and perhaps earlier, Zeppelin has a REST API that lets you run notebooks. Your shell script can use curl to access the API.

The API includes methods to delete a paragraph and to insert a paragraph at a particular index. This allows you to express all your "parameters" as variables in paragraph 0 and then use them in later paragraphs. Make 3 calls to the REST API in this order:

  1. Delete the notebook's current paragraph 0.
  2. Insert a new paragraph containing variable assignments at index 0.
  3. Run the notebook.
ChrisFal
  • 181
  • 6