-1

I am new to Parallel computing in R. I have gone through various links on StackOverFlow for the topic and wrote an initial code

library(doParallel)
library(foreach)

detectCores()
## [1] 4
# Create cluster with desired number of cores
cl <- makeCluster(3)
# Register cluster
registerDoParallel(cl)
# Find out how many cores are being used
getDoParWorkers()

My objective is to do a repetitive calculation on each row, my function looks something like

func2<-function(i)
{
  msgbody<-tolower(as.character(purchase$msg_body[i]))
  purchase$category[i]<-category_fun(i,msgbody)  
}

For this purpose I have written a foreach loop

foreach(i = 1:nrow(purchase)) %dopar% func2(i)

But, the issue is that "func2" is supposed to write back to dataframe but it is not writing anything back, all the entries are same as old

Appreciate you help.

Mohit Bansal
  • 131
  • 9
  • Are you saving the `foreach` result to anything? Also, you will probably need to export some variables to all the workers. Try running your code in a small dummy dataset first. – Roman Luštrik Mar 23 '16 at 08:42
  • Your function is missing a proper return value. Especially with parallel processing it is very important that you do proper functional programming. Pass every object that is needed inside the function as a function argument and return every object you need outside of the function. – Roland Mar 23 '16 at 09:38

1 Answers1

0

I believe this would work better in the scenario you're indicating. You can write a function that works on each msg_body string:

    func2 <- function(msg_body)
    {
      return(category_fun(i,tolower(as.character(purchase$msg_body[i])))  
    }

    result <- foreach(i=1:nrow(purchase),.combine=c) %dopar% {func2(purchase$msg_body[i]}

   purchase$category <- result

I do think you'll be better off using apply() to solve this though.

uday1889
  • 111
  • 3