0

As we know create Apache Livy connection is expensive. It will create new applications and upload task files.

My case is user can submit job use my web Api write with Java, then i use Apache Livy Client to submit job to spark.

I want to keep one or fix number Livy client instances, and i can check client state like Connection Pool.

Moon.Hou
  • 45
  • 1
  • 8

1 Answers1

1

If your job is a finite unit of work, then you should be using Livy's Batch abstraction and not Session. Sessions are for interactive work (e.g., Jupyter Notebook or Apache Zeppelin), where users submit some queries, evaluate the results, and submit some more. Batch, on the other hand, most closely resembles what you'd generally submit using spark-submit executable; it will also end on its own when the job tasks are done, and will clean up after itself, so there's no need for a connection pool. That said, a connection pool makes little sense for a Session, either, as each session have a state (variables defined in the past statements running in said session), and that state is not (and shouldn't be) shared

Alex Savitsky
  • 2,306
  • 5
  • 24
  • 30