I do have a simple setup of reading from Kafka and writing to local console:
SparkSession
is created with .master("local[*]")
and I start the stream with:
var df = spark.readStream.format("kafka").options(...).load()
df = df.select("some_column")
df.writeStream.format("console")
.outputMode("append")
.start()
.awaitTermination()
The same Kafka setup works perfectly fine when using with batch/normal DataFrame
, but for this streaming job I do get the exception:
Permission denied: user=user, access=WRITE, inode="/":hdfs:hdfs:drwxr-xr-x
Why does it want access to HDFS, when I want to get the data locally to the console? And how can I solve this?