What causes the tasks to not evenly partition in Spark?

Asked Feb 26 '17 at 07:17

Active Feb 26 '17 at 07:29

Viewed 156 times

In my experience, sometimes when I apply transformation() to large data, it seems that the tasks are not evenly partitioned and are skewed to one side so that only a few tasks are working. As a result, it was confirmed that the efficiency of the work was poor.

When the tasks are not evenly partitioned:

I want to know in more detail the cause of when the tasks are biased towards one another.

Any Ideas?

edited Feb 26 '17 at 07:29

asked Feb 26 '17 at 07:17

S.Kang

can you provide some code as to how you're creating the dataframe and what is your datasource? You can always try `.repartition([number_of_partitions],col([partitioning_column]))` to repartition your dataframe before calling the transformation – Gsquare Feb 26 '17 at 09:36
Regardless of my code, if the task is not evenly distributed, what causes it? – S.Kang Feb 26 '17 at 14:32

What causes the tasks to not evenly partition in Spark?

0 Answers0