0

I am using Databricks to create an algorithm for big data. I am wondering why the last 1% of my running process takes a lot of time? I am writing the result in S3, the result for 111991 data (out of 116367) is done in 5 minutes and just for the last 5000 takes more than a hour!!!!!

can I fix this issue?

enter image description here

in the following picture it takes hour 119 become 120, but it came to 199 in a few minutes

Alex Ott
  • 80,552
  • 8
  • 87
  • 132
user15649753
  • 475
  • 2
  • 12

1 Answers1

0

Please check you are writing file in one shot or writing in one chunk. If you are writing in one shot some time it switching log will take time. Also check if you are printing logs then it may take time.