0

My Spark Streaming application has a cycle of 5 minutes. Since past 2 months, there is recurrent pattern seen on the Driver Node. Presenting below snapshot of the performance metrics as seen from ganglia. The same behavior is seen in past weeks too.

  1. Setup : A driver with two executors (8G, 10 cores) on EMR
  2. Spark Version: 1.5.2
  3. GC : CMS for both driver & executor

Important: The application was started on Saturday as shown in the charts.

RAM enter image description here

CPU enter image description here

Network enter image description here

Can there be any explanation to this behavior? If I need to investigate, what observational pointers can I derive from this snapshot?

Mohitt
  • 2,957
  • 3
  • 29
  • 52
  • Can you show us your code? – Yuval Itzchakov Apr 07 '16 at 18:27
  • the code base is actually quite huge. Briefly put, it involves consuming two kafka topics. There are two 'moving-window' Queues (java-objects) maintained on the driver which are fed with the events got from the topics. After that, the two queues are parallelized to get two RDDs and a scoring algorithm is run – Mohitt Apr 08 '16 at 05:28

0 Answers0