4

being able to multithread on windows would be awesome, but perhaps this problem is harder than i had thought.. :(

inside of survey:::svyby.default there is a a block that's either lapply or mclapply depending on multicore=TRUE and your operating system. windows users get forced into the lapply loop no matter what, and i was wondering if there's any way to go down the mclapply path instead.. speeding up the computation.

i don't know too much about the innards of parallel processing, but i did some experiments to see if any of the windows-acceptable alternatives would work. first i tried overwriting mclapply with

mclapply <- 
    function( X , FUN , ... ){ 
        clusterApply( 
            x = X , 
            fun = FUN , 
            cl = makeCluster( detectCores() ) , ... ) 
    }

next i used fixInNamespace( svyby.default , "survey" ) to remove the line if (multicore) parallel:::closeAll()

but that only got me to the point where

> svyby(~api99, ~stype, dclus1, svymean , multicore=TRUE )
Error in checkForRemoteErrors(val) :
  3 nodes produced errors; first error: object 'svymean' not found
Anthony Damico
  • 5,779
  • 7
  • 46
  • 77
  • [this hack](http://www.stat.cmu.edu/~nmv/2014/07/14/implementing-mclapply-on-windows/) isn't a good solution for the `survey` package because large survey design objects have to be passed to the child processes, eating up any time savings – Anthony Damico Jul 17 '14 at 07:23

1 Answers1

5

quoting Dr. Thomas Lumley, author of the R survey package in response to my inquiry--

No. This approach to parallelising relies on forking, which Windows doesn't support.

It would be necessary to rewrite it to use clusterApply(), and I'm pretty sure the communications overhead would eat the speed gain. With forking, the child process gets a copy of the parent process data for free -- it's all done by the virtual<->physical memory translation hardware -- but with the cluster approach R has to send data to the child process explicitly.

Anthony Damico
  • 5,779
  • 7
  • 46
  • 77