1

I am trying to understand what the entries in my Spark UI signify.

enter image description here

Calling an action results in creation of a job. I am finding hard to understand

  1. How many of these jobs get created? and is that proportional to the number of micro-batches?
  2. What does the Duration column signify?
  3. What is the effect of setting the batch duration when instantiating the streaming context? Where is that visible in the Spark UI?

new StreamingContext(sparkSession.sparkContext, Seconds(50))

fledgling
  • 991
  • 4
  • 25
  • 48

1 Answers1

0

1.The jobs are proportional to the micro batches,say your streaming context time is 50 sec ,then you will have 2 jobs in a minute

2.Duration, specifies the amount of time taken to process a single micro batch or job.Ideally the duration taken to process a micro batch should be less than time specified for the micro batches.Say if its 50sec , each micro batch job should be complete well within that time

3.When you take the streaming option in the UI when the job is running , you can see that each micro batch is created in an interval of 50 sec

When you click on a job , you get the details of stages of that single micro-batch/job.I guess you have shared the screen hot of the same.Here the duration points to the time taken by each stage in the job to complete