1

I'm running a spark-submit like this:

spark-submit --deploy-mode client 
             --master yarn 
             --conf spark.files.overwrite=true 
             --conf spark.local.dir='/my/other/tmp/with/more/space' 
             --conf spark.executor.extraJavaOptions='-Djava.io.tmpdir=/my/other/tmp/with/more/space' 
             --conf spark.driver.extraJavaOptions='-Djava.io.tmpdir=/my/other/tmp/with/more/space'
             --files hdfs:///a_big_file.binary,hdfs:///another_big_file.binary 
              ... etc.

I need to add these two binary-files to the nodes in this way, since they are parsed by an external *.dll/*.so in the workers which can just process local files.

Now running in yarn=master deploy-mode=client my node gets driver and therefore pulls the files from hdfs to /tmp directory. Since these files are pretty big it fills up my limited /tmp directory pretty fast.

I wonder if anybody can point out the setting to change this path from /tmp to /my/other/tmp/with/more/space since I already set the arguments spark.local.dir, spark.executor.extraJavaOptions and spark.driver.extraJavaOptions.

Thank you, Maffe

maffe
  • 220
  • 2
  • 11

1 Answers1

0

If you already have those files on hdfs, you should not pass them as --files argument. --files should be used to create a local copy of some static data on each executor node. In your case you should pass file locations as spark job arguments to access later.

  • Hi, that is exactly what i pointed out. I need these files locally since the third-party tool can't load them from hdfs, but just from a local file. – maffe Nov 27 '18 at 13:20