Does anyone have experience with expiration policies on the S3 buckets hosting spark streaming checkpoints directories? I have setup an application using spark streaming + kafka and I want to use an S3 bucket with a 24 hour expiration policy set to hold the checkpoint directory. However, I want to confirm that this won't interfere with the checkpoints functionality. So, has anyone does this before?
Asked
Active
Viewed 121 times
0

Matthias J. Sax
- 59,682
- 7
- 117
- 137
-
Spark Streaming manages it's own deletion of checkpoint data. Why do you need this? – Yuval Itzchakov Sep 29 '16 at 21:23
-
@YuvalItzchakov the bucket was configured prior to this use case. Just wanted to know if I could reuse it. – Thaddeus Gholston Sep 29 '16 at 21:25