In our spark app we're consuming Kafka stream and storing data to Cassandra DB.
First, we've run the stream without backpressure and experienced a weird anomaly where processing time was constant ~ 1 minute, however the scheduling delay was increasing. In this way the queue was piling up, eventually crashing the stream.
Any thoughts why this could be happening? If it's not the processing, what can cause such dramatic delays?
Then we tried the same setup with backpressure (with increased maxRatePerPartition
), initially, everything was running well. Backpressure did its throttling job and we were able to process at a constant rate of ~ 100K / minute.
Then after few hours, something happened and the rate dropped rapidly to 5K / minute. The processing time was only 5-6 second with no scheduling delay, but backpressure absurdly kept the rate at 5k / minute and never increased. Actually, there was no reason to throttle down to 5K at all.
Our Setup:
Window: 1 minute
spark.streaming.kafka.maxRatePerPartition = 500 (4 partition * 60 sec * 500 = 120K / window)
spark.streaming.backpressure.enabled = true
spark.streaming.kafka.allowNonConsecutiveOffsets = true
spark.streaming.kafka.consumer.cache.enabled = false
Spark cluster with one master and 2 worker nodes