0

I have written relatively huge function(about 500 rows; including some savings of data, learning ANN and take their prediction) which output is list of data.frames but the problem is when the output is supposed to be bigger list (e.g. 30 000 data.frames in this list). I use the function this way

output_list<-mclapply(c(1:30000),FUN,mc.cores=detectCores(),mc.preschedule=FALSE)

and when I use it for

c(1:1000)

it takes about 100 secs, 10 data.frames per second. But when I use it for, lets say

c(1:10000)

it slows dramaticaly down and it takes about 6500 secs. And with increasing vector c() it is slower and slower.

I have tried to fix it by reducing the FUN (litle efect on small vector c() but the same problem on bigger computation)

A also have tried to fix it by creating for loop which compute 200 cases and put them into empty list and then compute another 200 computation, join into this empty list (which is not empty yet, it involved frames of last computation) and again the same until the end.

But the result is the same, it slows down dramaticaly again. I suppose the bug will be somewhere in environment or in some memory issue. Have anyone experience with that? Any advices how to solve it? The computation is still the same so I dont understand why it works well for smaller ones and slows down for huge ones. Thank you for your advices.

Bury
  • 527
  • 2
  • 5
  • 15
  • What happens if you write an external loop which feeds your `mclapply` indices `1:1000`, then starts a new `mclapply` with `1001:2000`, and so on? – Carl Witthoft Feb 10 '15 at 14:02
  • I tried to fix it exactly this way. In the end it is equaly slow that if I use "mclapply" indices "1:10000". I also add rm("partial output") into loop to delete objects which is not important after end of step of loop. – Bury Feb 10 '15 at 14:16

0 Answers0