1

I'm trying to connect to my remote cluster using spark-submit and run a jar file that I've put on hdfs.

I have the following property in my $SPARK_HOME/libexec/conf/core-site.xml, which is also in $HADOOP_HOME/libexec/etc/hadoop/:

<property>
    <name>fs.defaultFS</name>
    <value>hdfs://mydns.asuscomm.com:8021</value>
</property>

I can successfully view the file on my laptop using

hdfs dfs -ls hdfs:///user/stevenhurwitt/jars/

But when I run spark-submit using:

spark-submit --deploy-mode cluster \             
--class com.steven.redditStreaming.dataFrameFromCSVFile \
hdfs:///user/stevenhurwitt/jars/redditStreaming-1.0-SNAPSHOT.jar

I get the following error:

ERROR deploy.ClientEndpoint: Exception from cluster was: java.io.IOException: Incomplete HDFS URI, no host: hdfs:///user/stevenhurwitt/jars/redditStreaming-1.0-SNAPSHOT.jar

I have tried including hdfs:///mydns.asuscomm.com:8021/user... but still get the same error.

steven hurwitt
  • 183
  • 2
  • 15
  • 1
    _No host_ refers to absence of a hostname between scheme `hdfs://` and path `/user/...`. Your last hdfs uri is almost right, you just have to remove one slash, like this `hdfs://mydns.asuscomm.com:8021/user...` – mazaneicha Oct 09 '21 at 17:19
  • thank you, that did the trick! also helped me to realize *why* it uses two slashes vs three in some places – steven hurwitt Oct 09 '21 at 20:22
  • Great! Glad that helped! – mazaneicha Oct 10 '21 at 15:46

0 Answers0