Questions tagged [spark-checkpoint]

51 questions
0
votes
1 answer

Spark Structured Streaming- Is it possible to write the offset twice

I am using spark structured streaming to consume data from kafka topic and write the data into another kafka sink. I want to store the offset twice - once when reading from the topic and stire the offset. Secondly- when writing the data onto…
0
votes
1 answer

How to handle failure scenario in Spark write to orc file

I have a use case where I am pushing the data from Mongodb to HDFS in orc file which runs every 1 day interval and appends the data in orc file existing in hdfs. Now my concern is if while writing to orc file , the job somehow gets failed or…
0
votes
1 answer

pyspark checkpoint fails on local machine

I've just started learning pyspark using standalone on local machine. I can't get the checkpoint to work. I boiled down the script to this.... spark =…
Rich
  • 1
  • 1
0
votes
1 answer

hdfs URI is not resolved

In my curresnt spark application I am checkpointing to hdfs and the hdfs URI is like below hdfs:///tmp/log I am getting an error org.apache.hadoop.HadoopIllegalArgumentException: Uri without authority: hdfs:/tmp/ I am observing that /// is…
Girish Bhat M
  • 392
  • 3
  • 13
0
votes
1 answer

Checkpoint stream data to HDFS clulster

I have a HDFS cluster , and it has got two NameNodes. Usually if a use a HDFS client to save data, it takes care of which NameNode to use if one of these is down. But in Spark, for checkpointing, the API is:…
Amanpreet Khurana
  • 549
  • 1
  • 5
  • 17
-1
votes
1 answer

How to set Spark structured streaming check point dir to windows local directory?

My OS is Windows 11 and Apache Spark version is spark-3.1.3-bin-hadoop3.2 I try to use Spark structured streaming with pyspark. Belows are my simple spark structured streaming codes. spark =…
Joseph Hwang
  • 1,337
  • 3
  • 38
  • 67
1 2 3
4