It depends on how you configure the fs.defaultFS
in the core-site.xml
file. (spark:3.2.0, hadoop:3.2)
For HDFS
core-site.xml:
<property><name>fs.defaultFS</name><value>hdfs://hadoop-master:9000</value></property>
Code (default path is on hdfs storage):
df.writeStream.format("kafka").option("checkpointLocation", '/tmp/checkpoint').start()
For file
core-site.xml:
<property><name>fs.defaultFS</name><value>file:///</value></property>
Code (default path is on file storage):
df.writeStream.format("kafka").option("checkpointLocation", '/tmp/checkpoint').start()
Change storage in code
Code (Unimportant to default value):
df.writeStream.format("kafka").option("checkpointLocation", 'hdfs://hadoop-master:9000/tmp/checkpoint').start()
# or
df.writeStream.format("kafka").option("checkpointLocation", 'file:///tmp/checkpoint').start()