0

I am submiting Pyspark/SparkSQL script using spark-submit option and I need to pass runtime variables (database name) to script

spark-submit command:

spark-submit --conf database_parameter=my_database my_pyspark_script.py

pyspark script

database_parameter = SparkContext.getConf().get("database_parameter")           

DF = sqlContext.sql("SELECT count(*) FROM database_parameter.table_name")

spark version is: 1.5.2
Python version is: 2.7.5

The solution I am trying is not working. Error is : AttributeError: type object 'SparkConf' has no attribute 'getConf'.

I am looking for a way to pass runtime variable while calling the script through spark-submit and use those variables in script.

Shantanu Sharma
  • 3,661
  • 1
  • 18
  • 39

1 Answers1

3

You can use the usual sys.argv

args.py

#!/usr/bin/python

import sys
print sys.argv[1]

Then you spark-submit it :

spark-submit args.py my_database 

This will print:

my_database
philantrovert
  • 9,904
  • 3
  • 37
  • 61
  • Thanks for your response, there was some way to do it through --conf in spark-submit and get the value in script through getconf, but I am not able to recall that. – Shantanu Sharma Jul 25 '17 at 09:47
  • The parameters you pass through `--conf` should be spark related otherwise you will get `Warning: Ignoring non-spark config property` – philantrovert Jul 25 '17 at 09:55
  • Yeah, I am getting this warning. But there was something similar for runtime variable as well, If I can recall correctly. – Shantanu Sharma Jul 25 '17 at 10:02