Questions tagged [spark-checkpoint]
51 questions
0
votes
1 answer
Spark Structured Streaming- Is it possible to write the offset twice
I am using spark structured streaming to consume data from kafka topic and write the data into another kafka sink.
I want to store the offset twice - once when reading from the topic and stire the offset.
Secondly- when writing the data onto…

Shivani
- 1
0
votes
1 answer
How to handle failure scenario in Spark write to orc file
I have a use case where I am pushing the data from Mongodb to HDFS in orc file which runs every 1 day interval and appends the data in orc file existing in hdfs.
Now my concern is if while writing to orc file , the job somehow gets failed or…

tenderfoot
- 69
- 1
- 9
0
votes
1 answer
pyspark checkpoint fails on local machine
I've just started learning pyspark using standalone on local machine. I can't get the checkpoint to work. I boiled down the script to this....
spark =…

Rich
- 1
- 1
0
votes
1 answer
hdfs URI is not resolved
In my curresnt spark application I am checkpointing to hdfs and the hdfs URI is like below
hdfs:///tmp/log
I am getting an error org.apache.hadoop.HadoopIllegalArgumentException: Uri without authority: hdfs:/tmp/
I am observing that /// is…

Girish Bhat M
- 392
- 3
- 13
0
votes
1 answer
Checkpoint stream data to HDFS clulster
I have a HDFS cluster , and it has got two NameNodes.
Usually if a use a HDFS client to save data, it takes care of which NameNode to use if one of these is down.
But in Spark, for checkpointing, the API is:…

Amanpreet Khurana
- 549
- 1
- 5
- 17
-1
votes
1 answer
How to set Spark structured streaming check point dir to windows local directory?
My OS is Windows 11 and Apache Spark version is spark-3.1.3-bin-hadoop3.2
I try to use Spark structured streaming with pyspark. Belows are my simple spark structured streaming codes.
spark =…

Joseph Hwang
- 1,337
- 3
- 38
- 67