I have a spark job that I run using the spark-submit command. The jar that I use is hosted on hdfs and I call it from there directly in the spark-submit query using its hdfs file path.
With this same logic, I'm trying to do the same when for the --jars options, the files options and also the extraClassPath option (in the spark.conf) but it seems that there is an issue with the fact that it point to a hdfs file path.
My command looks like this:
spark-submit \
--class Main \
--jars 'hdfs://path/externalLib.jar' \
--files 'hdfs://path/log4j.xml' \
--properties-file './spark.conf' \
'hdfs://path/job_name.jar
So not only when I call a method that refers the externalLib.jar, spark raises an exception telling me that it doesn't find the method but also from the starts I have the warning logs:
Source and destination file systems are the same. Not copying externalLib.jar Source and destination file systems are the same. Not copying log4j.xml
It must come from the fact that I precise a hdfs path because it works flawlessly when I refers to those jar in the local file system.
Maybe it isn't possible ? What can I do ?