I am running one streaming application and processing data from Kafka to Kafka using spark. If i am using latest then its working as expected and running without any issue.
but in source we have done bulk transaction (200 000) and using earliest then processing the data. In that case our spark job is not processing data and its stuck after 3 stages. can someone suggest me how should i handle this, so I can process this bulk data data.
I am using below configurations:
TRIGGERFREQUENCY 1 seconds
STARTINGOFFSETS earliest
--num-executors 6
--driver-cores 6
--driver-memory 8G
--executor-cores 6
--executor-memory 8G
I have tried below configuration in my spark application.
--conf spark.streaming.backpressure.enabled=true
--conf spark.streaming.backpressure.initialRate=60
--conf spark.streaming.kafka.maxRatePerPartition=50
To control no of events in a batch but its not taking this and I am bale to see 30000 records in first batch which is spark is not able to process in single batch and stuck.