0

I have a spark job that I run using the spark-submit command. The jar that I use is hosted on hdfs and I call it from there directly in the spark-submit query using its hdfs file path.

With this same logic, I'm trying to do the same when for the --jars options, the files options and also the extraClassPath option (in the spark.conf) but it seems that there is an issue with the fact that it point to a hdfs file path.

My command looks like this:

spark-submit \
  --class Main \
  --jars 'hdfs://path/externalLib.jar' \
  --files 'hdfs://path/log4j.xml' \
  --properties-file './spark.conf' \
  'hdfs://path/job_name.jar 

So not only when I call a method that refers the externalLib.jar, spark raises an exception telling me that it doesn't find the method but also from the starts I have the warning logs:

Source and destination file systems are the same. Not copying externalLib.jar Source and destination file systems are the same. Not copying log4j.xml

It must come from the fact that I precise a hdfs path because it works flawlessly when I refers to those jar in the local file system.

Maybe it isn't possible ? What can I do ?

Omegaspard
  • 1,828
  • 2
  • 24
  • 52

0 Answers0