0

I have a spark application, that I run with the command

/usr/hdp/spark2-client/bin/spark-submit \
  --name 'App' \
  --class 'someFolder.SomeApp \
  --master "yarn" \
  --deploy-mode "cluster" \
  --num-executors 4 \
  --executor-cores 3 \
  --executor-memory 4g \
  --conf spark.sql.shuffle.partitions=10 \
  --conf spark.default.parallelism=10 \
  --files 'hdfs:///file1','hdfs:///file2' \
  'assembly-0.25.0-3-ge05360d.jar' \
  'param1' 'param2'

but now I want to run it in oozie. My question is - how I can represent such params as --files that is a list of files, in job.properties file?

Slavik Muz
  • 1,157
  • 1
  • 15
  • 28

2 Answers2

1

If you run through oozie action shell, then:

<file>hdfs:///file1#file1</file>
<file>hdfs:///file2#file2</file>
0

One hack to do this is to put all this in a shell script , and invoke the shell script(from oozie) to start the spark app (of course move the shell script to some hdfs location)

Use this link to copy files to the container : https://stackoverflow.com/a/22395918/1416616

If above does not work , --files 'hdfs:///file1','hdfs:///file2' in the shell script should work

In your spark application logs always see the contents of the container to make sure required file are copied to the container.

Give it a try .