0

When using foreach and doRedis the doRedis workers wait until all jobs have reached the redis server before beginning processing. Is it possible to have them begin before all the preprocessing has finished?

I am using an iterator which is working great - preprocessing happens 'just in time' and the job data begins to hit the server as the iterator runs. I can't seem to take advantage of this behavior, though, because the workers just wait until all jobs have been uploaded.

Example code:

library(foreach)
library(doRedis)

registerDoRedis("worklist", "0.0.0.0")

foreach (var = complex.iter(1:1E6)) %dopar% {
    process.function(var)
    }

In this example complex.iter takes a while and there are many elements to iterate over. As such it would be great if workers started running process.function() before all the preprocessing is finished. Unfortunately they seem to wait until complex.iter has run on all elements.

I have set .inorder=F.

Any suggestions as to how to achieve this desired behavior? Thanks.

nate
  • 927
  • 1
  • 7
  • 19

2 Answers2

0

You can try a couple of things to make it run smother. One is setting the chunk size and the other is to start local workers to get tasks going in the background.

[Here is a link to the PDF explaining how these two functions are used properly]

startLocalWorkers & setChunkSize

Without more information on the data, functions and tasks it is hard to help you any more than that.

sconfluentus
  • 4,693
  • 1
  • 21
  • 40
  • Load balancing is fine. The issue is with when the remote workers begin processing jobs. – nate Aug 07 '16 at 18:26
  • Your iteration is taking place inside of a function (foreach is creating a closed environment). Unless you make the function push data out prior to completion it will not be available to the redis task whether you have parallel cores assigned or not. You need to think about how to manage data within that environment and push it out incrementally if you want the processes to occur simultaneously or you will need to include the Redis script within the function to promote concurrent work as each iteration is completed. – sconfluentus Aug 08 '16 at 17:07
  • Data is hitting redis immediately. Eg, the chunks to iterate over (eg `var` above) and the code to run (eg `process.function(var)` above). That's not the problem. – nate Aug 08 '16 at 18:35
  • Include the working code, not the parallel code, if you want help with getting the work done. It is impossible to tell from what you have shared. – sconfluentus Aug 08 '16 at 19:08
0

In case others have the same question:

The answer is currently no, the iterator completes aggregation of all task data prior to uploading and distributing jobs to workers. Relevant discussion here: https://github.com/bwlewis/doRedis/issues/39

I was also wrong in my question in that the iterator was completing before data was uploaded. Still, the blocking upload causes the workers to wait not only until the iterator is finished but also until upload has completed.

I'll update the answer if we implement any changes.

nate
  • 927
  • 1
  • 7
  • 19