1

I'm experiencing a strange behavior while streaming from Kafka using spark 2.1.0/2.0.2 on AWS EMR.

"spark.streaming.concurrentJobs" was set to 1 explicitly to the streaming job but after running for a while, the job tab showed more than 1 active jobs running and such "active" jobs keep increasing.

Inside such jobs, some stages remaining not executed for ever (status is --). However all the tasks are shown as SUCCEED under those jobs.

What could be wrong here? A more strange thing is that, such behavior seems not occurring unless I open the Spark UI page to check the current status frequently.

Jobs tab - http://ibb.co/j6XkXk Stages - http://ibb.co/budg55

It was only Job 12109 at the beginning. Things got piled up when I switched tabs a couple of times.

Regards, Alex

Alex
  • 11
  • 2
  • How is your total delay chart? Could you post a screenshot of the spark ui? – maasg Jun 19 '17 at 19:33
  • Jobs tab - https://ibb.co/j6XkXk Stages - https://ibb.co/budg55 – Alex Jun 19 '17 at 21:24
  • The bottom job 12109 stuck there for 4.5h. Tried killing it but nothing changed at all. – Alex Jun 19 '17 at 21:29
  • Another screenshot - https://ibb.co/cgpkyQ Job 797 did not even finish but was marked as Succeeded. The next job did not even start. – Alex Jun 19 '17 at 21:52
  • Have you tried suggestions on [this](https://stackoverflow.com/questions/30142088/why-there-are-so-many-tasks-in-my-spark-streaming-job) post? Mostly to do with finding the right block and batch interval settings. – Vikas Tikoo Jun 20 '17 at 20:34
  • Also curious if you are using any stateful operations in your streaming job? – Vikas Tikoo Jun 20 '17 at 22:19

0 Answers0