0

During spark streaming with streaming-kafka-0-8-integration Direct Approach, If the batches are getting queued, will the executors pull the data for queued batches into their memory? If not, what is the harm in having a very long backlog of batches?

phoenix
  • 1
  • 1
  • 1

1 Answers1

1

Yes, the Spark will pull data from Kafka Queue and do processing on memory and the harm would be a pressure on Kafka resource as Kafka is having the long backlog of batches.

  • Sorry, may be it is a typo, why kafka is having long backlog? – phoenix Feb 05 '18 at 16:52
  • Kafka actually stores data in the queue. Data will be in backlog(cluster storage) until the data is read from the queue. there should not be any performance issue because of this. – Sandish Kumar H N Feb 09 '18 at 08:08