3

I am trying to execute a pyspark, I need to use some named args in my programs. Any ideas to solve this issue

  • 1
    Can you provide more details - named args for what? Passed in command-line? If yes, just use any Python command-line parsing library – Alex Ott Jul 20 '20 at 17:47
  • This is what you are looking for ,i think https://stackoverflow.com/questions/32217160/can-i-add-arguments-to-python-code-when-i-submit-spark-job – tarun Jul 21 '20 at 05:23

1 Answers1

7

use the argparse ArgParse AII to read the named arguments from the spark-submit. the below code will work with spark-submit.

spark-submit --master yarn --deploy-mode cluster --num-executors 2 --executor-memory 1G --executor-cores 2 --driver-memory 1G  spark_read_write.py --inputpath <input path> --outputpath <output path> --configpath <config path>
# Include standard modules
import argparse

from pyspark.sql import SQLContext,SparkSession
from pyspark import SparkContext,SparkConf
from pyspark.sql.functions import *
from pyspark.sql.types import *

spark = SparkSession.builder.appName("ReadWriteSpark").getOrCreate()
sparkcont = SparkContext.getOrCreate(SparkConf().setAppName("ReadWriteSpark"))
logs = sparkcont.setLogLevel("ERROR")

# Initiate the parser
parser = argparse.ArgumentParser()

# Add long and short argument
parser.add_argument("--inputpath", "-inputpath", help="configuration spark path")
parser.add_argument("--outputpath", "-outputpath", help="output spark path")
parser.add_argument("--configpath", "-outputpath", help="output spark path")

# Read arguments from the command line
args = parser.parse_args()

# Check for --cofigpath
if args.cofigpath:
    configpath=args.cofigpath
# Check for --inputpath
if args.inputpath:
    inputpath=args.inputpath
# Check for --outputpath
if args.outputpath:
    outputpath=args.outputpath

df = spark.read.format("json")
    .load(inputpath)

df.write.csv(outputpath)
Wai Ha Lee
  • 8,598
  • 83
  • 57
  • 92
sathya
  • 1,982
  • 1
  • 20
  • 37