1

Let's say, I want to keep creating a sessions for every Spark job that is submitted to the YARN. Every connection has a unique user, who keeps polling the status and post statements to a session. How do I calculate, at any given time, Livy can have, how many active sessions? Is it based on the livy.spark.driver size that I configure? what are all the other parameters involved in this calculation ?

Anandkumar
  • 1,338
  • 13
  • 15

1 Answers1

0

yarn has a scheduler to utilize AM containers and livy will be initializing accepted requests on yarn with available resource on cluster/standalone server. see yarn-scheduler livy-client.conf should be configured to handle long running jobs to yield reponse.

livy-client.conf

Time between status checks for cancelled a Job

livy.rsc.job-cancel.trigger-interval = 100ms

Time before a cancelled a Job is forced into a Cancelled state

livy.rsc.job-cancel.timeout = 60m

here is a sample code you should filter state: busy sessions from output.

import requests

host = "{livy_host}:8998"
sessions = requests.get(host + '/sessions/')

output b'{"from":0,"total":1,"sessions":[{"id":3,"appId":"application_1566223151385_0085","owner":null,"proxyUser":null,"state":"busy","kind":"pyspark","appInfo":{"driverLogUrl":"{livy_host}:8042/node/containerlogs/container_e182_1566223151385_0085_01_000001/mapr","sparkUiUrl":"{livy_host}:8088/proxy/application_1566223151385_0085/"},"log":[""]}]}'

sum(session['state'] == 'busy' for session in sessions.json()['sessions'])
oetzi
  • 1,002
  • 10
  • 21