I'm setting up a Apache Spark long-running streaming job to perform (non-parallelized) streaming using InputDStream.
What I'm trying to achieve is that when a batch on the queue takes too long (based on a user defined timeout), I want to be able to skip the batch and abandon it completely - and continue the rest of execution.
I wasn't able to find a solution to this problem within the spark API or online -- I looked into using StreamingContext awaitTerminationOrTimeout, but this kills the entire StreamingContext on timeout, whereas all I want to do is skip/kill the current batch.
I also considered using mapWithState, but this doesn't seem to apply to this use case. Finally, I was considering setting up a StreamingListener and starting a timer when the batch starts and then having the batch stop/skip/killed when reaching a certain timeout threshold, but there still doesn't seem to be a way to kill the batch.
Thanks!