I'm quite new to Big Data and currently, I'm working on a CLI project that performs some text parsing using apache spark.
When a command is typed, a new sparkcontext is instantiated and some files are read from a hdfs instance. However, the spark is taking too much time to initialize a sparkcontext or even a sparksession object.
So, my question is:- Is there a way to reuse a sparkcontext instance between these commands to reduce this overhead? I've heard about spark job server, but it's been too hard to deploy a local server since its main guide is a bit confusing.
Thank you.
P.S.: I'm using pyspark