I am using Databricks to create an algorithm for big data. I am wondering why the last 1% of my running process takes a lot of time? I am writing the result in S3, the result for 111991 data (out of 116367) is done in 5 minutes and just for the last 5000 takes more than a hour!!!!!
can I fix this issue?
in the following picture it takes hour 119 become 120, but it came to 199 in a few minutes