Can we use spark session object without explicitly creating it, if Submit a job by spark-submit

Question

My question is very basic, My code is working fine. But I am not clear with these two points:

1) when we submit any pyspark job using spark-submit do we need to create spark session object like this ? in my script:

from pyspark.sql import SparkSession,SQLContext
from pyspark.conf import SparkConf
spark = SparkSession \
    .builder \
    .enableHiveSupport() \
    .appName("test") \
    .getOrCreate()
print(spark)
sqlContext = SQLContext(spark)

or i can directly access spark session object in my script with out creating it.

from pyspark.sql import SparkSession,SQLContext
from pyspark.conf import SparkConf
print(spark) -- this can be ***sc*** not sure I am using spark-2
sqlContext = SQLContext(spark)

and if spark session object is available then how i can add config properties such as below or how to enable hive support.

spark = SparkSession \
.builder \
.enableHiveSupport() \
.config(conf=SparkConf().set("spark.driver.maxResultSize", "2g")) \
.appName("test") \
.getOrCreate()

2) Another approach is without using spark-submit I can write my python code to generate spark-session object and use it like this

My doubt is if i submit job using spark-submit and creating spark session object as mentioned above am i ending up creating two spark session ?

It would be very helpful if someone can explain me added advantage of using spark-submit over step 2 method. And do i need to create spark-session object if i invoke job using spark-submit from command line

I did not understand step 2. Can you explain that? How would you use that script? — mrsrinivas, Sep 12 '17 at 12:34
like this probably https://stackoverflow.com/questions/41926219/error-message-when-launching-pyspark-from-jupyter-notebook-on-windows — user07, Sep 12 '17 at 14:29
Hi, if below answer has solved your problem please consider [accepting it](http://meta.stackexchange.com/q/5234/179419) or adding your own solution. So, that it indicates to the wider community that you've found a solution. — mrsrinivas, Oct 07 '17 at 09:21
Hi, below answer has cleared few of my doubts, but i need to little bit more testing before i confirm. — user07, Oct 09 '17 at 10:13

score 4 · Answer 1 · edited Jun 20 '20 at 09:12

When we submit any pySpark job using spark-submit do we need to create spark session object?

Yes, It is not needed only in case of shells.

My doubt is if i submit job using spark-submit and creating spark session object as mentioned above am I ending up creating two spark session ?

TL,DR; No

If we check the code you have written

spark = SparkSession \
  .builder \
  .enableHiveSupport() \
  .config(conf=SparkConf().set("spark.driver.maxResultSize", "2g")) \
  .appName("test") \
  .getOrCreate()

Observe getOrCreate(), it will take care of at any time only one SparkSession Object (spark) exists.

I would recommend to create the context/session in local and makes code pure(as not depending on other our sources for object).

Can we use spark session object without explicitly creating it, if Submit a job by spark-submit

1) when we submit any pyspark job using spark-submit do we need to create spark session object like this ? in my script:

2) Another approach is without using spark-submit I can write my python code to generate spark-session object and use it like this

1 Answers1

TL,DR; No