0

I am trying to convert this code to the one that can be executed on Windows:

numCores <- detectCores()
results <- mclapply(seq(1, 500), function(file, fID){
  myData <- fread(file.path(dirPath, fID, paste0(file, ".csv")))
  return(cbind(myData, rep(file, nrow(myData))))
}, mc.cores = numCores, fID = 1)

Based on using this tutorial, I wrote the following code...

UPDATE: The correct code is provided below:

getAllMyData <- function(numCores,folderID)
{
  dirPath = paste0("D:/home/", folderID, '/')
  cl <- makeCluster( 4 )
  allTrips = parLapply(cl, 1:200, function(z){
    myData <- read.csv(paste0(dirPath, z, ".csv"))
    return(cbind(myData , rep(z, nrow(myData))))
  })
  stopCluster(cl)  
  return(allTrips)
}

numCores <- detectCores()
allMyData <- getAllMyData(numCores,1)
Klausos Klausos
  • 15,308
  • 51
  • 135
  • 217

1 Answers1

1

Your first code calls a function

function(file, fID)

Your second code, by contrast, uses

function(dirPath,fID)

That’s the error.

Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
  • Thank you. I updated my post where I posted the correct code. However, it's strange that the runtime is almost the same as the serialized code. – Klausos Klausos Jan 12 '15 at 17:14
  • @KlausosKlausos It's not so surprising: reading data from disk (which is the majority of what your code is doing) cannot be parallelised efficiently because it is an IO bound, not a CPU bound operation. – Konrad Rudolph Jan 15 '15 at 08:22