Below is my spark-submit command
/usr/bin/spark-submit \
--class "<class_name>" \
--master yarn \
--queue default \
--deploy-mode cluster \
--conf "spark.driver.extraJavaOptions=-DENVIRONMENT=pt -Dhttp.proxyHost=<proxy_ip> -Dhttp.proxyPort=8080 -Dhttps.proxyHost=<proxy_ip> -Dhttps.proxyPort=8080" \
--conf "spark.executor.extraJavaOptions=-DENVIRONMENT=pt -Dhttp.proxyHost=<proxy_ip> -Dhttp.proxyPort=8080 -Dhttps.proxyHost=<proxy_ip> -Dhttps.proxyPort=8080" \
--packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.3.0 \
--driver-memory 3G \
--executor-memory 4G \
--num-executors 2 \
--executor-cores 3 <jar_file>
The spark-submit command timesout on resolving the package dependency
Replacing --packages
with --jar
works but I would like to get to the bottom of why --packages
is not working for me. Also for http.proxyHost
and https.proxyHost
I specify only the ip address without http://
or https://
?
Edit
Please note the following
- The machine I am deploying from and the spark cluster is behind http proxy
- I know what the difference between --jars and --packages is. I want to get the --packages option to work in my case.
- I have tested the http proxy settings for my machine. I can reach out to the internet from my machine. I can do a
curl
. For some reason it feels like spark-submit is not picking up the http proxy setting