Im running the function parLapply inside a loop and im verifying a strange behaviour. The time per iteration was increasing significantly and it didn't make much sense such an increase.
So i started clocking the functions within the cycle to see which one was taking the most time and i found out that parLapply was taking >95% of the time. So i went inside the parLapply function and clocked it as well to see if the times between inside and outside of the function match. And they did not by quite a large margin. This margin increases over time and the difference can reach seconds which makes quite an impact on the time it takes for the algorithm to complete.
while (condition) {
start.time_1 <- Sys.time()
predictions <- parLapply(cl, array, function(i){
start.time_par <- Sys.time()
#code
end.time <- Sys.time()
time.taken_par<- end.time - start.time_par
print(time.taken_par)
return(value)
})
end.time <- Sys.time()
time.taken <- end.time - start.time_1
print(time.taken)
}
I would be expecting that time.taken would be similar to the sum of all time.taken_par. But it is not. The sum of all time.taken_par is usually 0.026 seconds while time.taken starts by being 4 times that value, which is fine, but then increases to a lot more (>5 seconds).
Can anyone explain what is going on and/or if what im thinking should happen is wrong? Is it a memory issue?
Thanks for the help!
Edit:
The output of parLapply
is the following. However in my tests there are 10 lists instead of just 3 as in this example. The size of the each individual list that is returned by parLapply
is always the same and in this case is 25.
[1] 11
[[1]]
1 2 3 4 5 6 7 8 9 10 11 12 13 14
-0.01878590 -0.03462315 -0.03412670 -0.06016549 -0.02527741 -0.06271799 -0.05429947 -0.02521108 -0.04291305 -0.03145491 -0.08571382 -0.07025075 -0.07704650 0.25301839
15 16 17 18 19 20 21 22 23 24 25
-0.02332236 -0.02521089 -0.01170326 0.41469539 -0.15855689 -0.02548952 -0.02545446 -0.10971302 -0.02521836 -0.09762386 0.02044592
[[2]]
1 2 3 4 5 6 7 8 9 10 11 12 13 14
-0.01878590 -0.03462315 -0.03412670 -0.06016549 -0.02527741 -0.06271799 -0.05429947 -0.02521108 -0.04291305 -0.03145491 -0.08571382 -0.07025075 -0.07704650 0.25301839
15 16 17 18 19 20 21 22 23 24 25
-0.02332236 -0.02521089 -0.01170326 0.41469539 -0.15855689 -0.02548952 -0.02545446 -0.10971302 -0.02521836 -0.09762386 0.02044592
[[3]]
1 2 3 4 5 6 7 8 9 10 11 12 13 14
-0.01878590 -0.03462315 -0.03412670 -0.06016549 -0.02527741 -0.06271799 -0.05429947 -0.02521108 -0.04291305 -0.03145491 -0.08571382 -0.07025075 -0.07704650 0.25301839
15 16 17 18 19 20 21 22 23 24 25
-0.02332236 -0.02521089 -0.01170326 0.41469539 -0.15855689 -0.02548952 -0.02545446 -0.10971302 -0.02521836 -0.09762386 0.02044592
Edit2:
Ok i have found out what the problem was. I have an array that i initialize using vector("list",10000)
. And in each iteration of the cycle i add a list of lists to this array. This list of lists has size 6656 bytes. So over the 10000 iteration it doesn't even add up to 0.1Gb. However as this array start filling up the performance of the parallelization starts to degrade. I have no idea as to why this is happening as im running the script on a machine with 64Gb of RAM. Is this a known problem?