Distribution of spark code into jobs, stages and tasks

Question

As per my understanding each action in whole job is translated to job, whil each shuffling stage within a job is traslated into stage and each partition for each stages input is translated into task.

Please corrrect me if I am wrong, I am unable to get any actual definition.

score 1 · Accepted Answer · answered Aug 29 '17 at 08:36

Invoking an action inside a Spark application triggers the launch of a Spark job to fulfill it.Spark examines the DAG and formulates an execution plan.The execution plan consists of assembling the job’s transformations into stages.

When Spark optimises code internally, it splits it into stages, where each stage consists of many little tasks.Each stage contains a sequence of transformations that can be completed without shuffling the full data.

Every task for a given stage is a single-threaded atom of computation consisting of exactly the same code, just applied to a different set of data.The number of tasks is determined by the number of partitions. To manage the job flow and schedule tasks Spark relies on an active driver process. The executor processes are responsible for executing this work, in the form of tasks, as well as for storing any data that the user chooses to cache A single executor has a number of slots for running tasks and will run many concurrently throughout its lifetime.

Distribution of spark code into jobs, stages and tasks

1 Answers1