1

I am writting a scala project that I want to have classes that are executable from spark-submit as a jar class. (e.g. spark-submit --class org.project

My problems are the following:

  1. I want to use the spark-context-configuration that the user sets when doing a spark submit and overwrite optionally some parameters like the Application name. Example: spark-submit --num-executors 6 --class org.project will pass 6 in number of exectors configuration field in spark context.

  2. I want to be able to pass option parameters like --inputFile or --verbose to my project without interfering with the spark parameters (possibly with avoid name overlap)
    Example: spark-submit --num-executors 6 --class org.project --inputFile ./data/mystery.txt should pass "--inputFile ./data/mystery.txt" to the args input of class org.project main method.

What my progress is in those problems is the following:

  1. I run val conf = new SparkConf().setAppName("project"); val sc = new SparkContext(conf); in my main method,
    but I am not sure if this does things as expected.

  2. Sparks considers those optional arguments as arguments of the spark-submit and outputs an error.

Note.1: My java class project currently does not inherit any other class.

Note.2: I am new to the world of spark and I couldn't find something relative from a basic search.

ysig
  • 447
  • 4
  • 18

1 Answers1

0

You will have to handle parameter parsing yourself. Here we use Scopt.

When your spark-submit your job, it must enter through an object def main(args: Array[String]). Takes theses args and parse them using your favorite argument parser, set your sparkConf and SparkSession accordingly and launch your process.

Spark has examples of that whole idea: https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/mllib/DenseKMeans.scala

Michel Lemay
  • 2,054
  • 2
  • 17
  • 34