I am trying to execute a pyspark, I need to use some named args in my programs. Any ideas to solve this issue
Asked
Active
Viewed 2,602 times
3
-
1Can you provide more details - named args for what? Passed in command-line? If yes, just use any Python command-line parsing library – Alex Ott Jul 20 '20 at 17:47
-
This is what you are looking for ,i think https://stackoverflow.com/questions/32217160/can-i-add-arguments-to-python-code-when-i-submit-spark-job – tarun Jul 21 '20 at 05:23
1 Answers
7
use the argparse
ArgParse AII to read the named arguments from the spark-submit.
the below code will work with spark-submit.
spark-submit --master yarn --deploy-mode cluster --num-executors 2 --executor-memory 1G --executor-cores 2 --driver-memory 1G spark_read_write.py --inputpath <input path> --outputpath <output path> --configpath <config path>
# Include standard modules
import argparse
from pyspark.sql import SQLContext,SparkSession
from pyspark import SparkContext,SparkConf
from pyspark.sql.functions import *
from pyspark.sql.types import *
spark = SparkSession.builder.appName("ReadWriteSpark").getOrCreate()
sparkcont = SparkContext.getOrCreate(SparkConf().setAppName("ReadWriteSpark"))
logs = sparkcont.setLogLevel("ERROR")
# Initiate the parser
parser = argparse.ArgumentParser()
# Add long and short argument
parser.add_argument("--inputpath", "-inputpath", help="configuration spark path")
parser.add_argument("--outputpath", "-outputpath", help="output spark path")
parser.add_argument("--configpath", "-outputpath", help="output spark path")
# Read arguments from the command line
args = parser.parse_args()
# Check for --cofigpath
if args.cofigpath:
configpath=args.cofigpath
# Check for --inputpath
if args.inputpath:
inputpath=args.inputpath
# Check for --outputpath
if args.outputpath:
outputpath=args.outputpath
df = spark.read.format("json")
.load(inputpath)
df.write.csv(outputpath)

Wai Ha Lee
- 8,598
- 83
- 57
- 92

sathya
- 1,982
- 1
- 20
- 37