0

In my curresnt spark application I am checkpointing to hdfs and the hdfs URI is like below

hdfs:///tmp/log

I am getting an error org.apache.hadoop.HadoopIllegalArgumentException: Uri without authority: hdfs:/tmp/

I am observing that /// is resolved to /.

Is this a bug or am I missing any configuration. Thank you

Girish Bhat M
  • 392
  • 3
  • 13
  • 1
    what is your question? the error is because the processiing user doesn't have authority to `hdfs:/tmp/` and regarding your second query both `hdfs:///tmp/log` and `hdfs:/tmp/log` are same – Ramesh Maharjan May 07 '18 at 04:21
  • when I pass the path as hdfs://nameNode/tmp/log I am not seeing the error and the executing user has all the previlages to the hdfs path. – Girish Bhat M May 07 '18 at 04:34
  • An "authority" is part of the URI definition, not permissions – OneCricketeer May 07 '18 at 17:58

1 Answers1

0

It depends on how you configure the fs.defaultFS in the core-site.xml file. (spark:3.2.0, hadoop:3.2)

For HDFS

core-site.xml:

<property><name>fs.defaultFS</name><value>hdfs://hadoop-master:9000</value></property>

Code (default path is on hdfs storage):

df.writeStream.format("kafka").option("checkpointLocation", '/tmp/checkpoint').start()

For file

core-site.xml:

<property><name>fs.defaultFS</name><value>file:///</value></property>

Code (default path is on file storage):

df.writeStream.format("kafka").option("checkpointLocation", '/tmp/checkpoint').start()

Change storage in code

Code (Unimportant to default value):

df.writeStream.format("kafka").option("checkpointLocation", 'hdfs://hadoop-master:9000/tmp/checkpoint').start()

# or

df.writeStream.format("kafka").option("checkpointLocation", 'file:///tmp/checkpoint').start()
Amir Bashiri
  • 129
  • 1
  • 5