I am exploring my options for Checkpointing with Spark Structured Streaming and have read that the "eventual consistency" of S3 is not ideal for checkpointing. I am trying to determine whether this is this accurate? I doubt that a Spark Structured Streaming job would write to the checkpoint location and subsequently read from it within the job to determine where to continue from. Wouldn't the current checkpoint also be stored in memory within the context of the job (which means that reading from S3 would not be required to determine the current checkpoint)?
I am able to specify a location on S3 for checkpointing but I am trying to determine whether this is against best practices. Can someone please clarify whether it would not be optimal to use S3 as the Checkpoint location and if so, why?