0

In my experience, sometimes when I apply transformation() to large data, it seems that the tasks are not evenly partitioned and are skewed to one side so that only a few tasks are working. As a result, it was confirmed that the efficiency of the work was poor.

When the tasks are not evenly partitioned: enter image description here

I want to know in more detail the cause of when the tasks are biased towards one another.

Any Ideas?

S.Kang
  • 581
  • 2
  • 10
  • 28
  • can you provide some code as to how you're creating the dataframe and what is your datasource? You can always try `.repartition([number_of_partitions],col([partitioning_column]))` to repartition your dataframe before calling the transformation – Gsquare Feb 26 '17 at 09:36
  • Regardless of my code, if the task is not evenly distributed, what causes it? – S.Kang Feb 26 '17 at 14:32

0 Answers0