R package that is a “parallel backend” for the foreach package. It provides a mechanism needed to execute foreach loops in parallel.
Questions tagged [doparallel]
453 questions
2
votes
0 answers
Several questions on running Rmpi and foreach on a HPC cluster
I am queueing and running an R script on a HPC cluster via sbatch and mpirun; the script is meant to use foreach in parallel. To do this I've used several useful questions & answers from StackOverflow: R Running foreach dopar loop on HPC MPIcluster,…

pglpm
- 516
- 4
- 14
2
votes
2 answers
Parallelization/Optimization of R loops containing *apply
I am working on implementing an algorithm where I try to find 5 vectors out of 20 which are "furthest apart", using some measure. To do that i use combnPrime where I get a list of some 77000 vectors representing all 5-vector grouped combinations.…

life_steal
- 55
- 7
2
votes
0 answers
Speeding up parallel SQL querying for R?
I have a dataframe df with an id column. This maps to many rows 1:n) in my database table. Querying each ID sequentially takes about an hour to complete, so I'm trying to run multiple queries at once using the doparallel package. There is overhead…

CorerMaximus
- 653
- 5
- 15
2
votes
1 answer
Setting cores via mc.cores vs. makePSOCKcluster?
I was wondering what is the difference between setting the number of cores for R to use via makePSOCKcluster and explictly in the foreach loop? Should I be setting this seperately in both instances, or is doing so when making the makePSOCKcluster…

CorerMaximus
- 653
- 5
- 15
2
votes
0 answers
Memory use in foreach grows until failure but object.size() shows no change
I am running a large parallel R function, which contains a for-loop to be executed on each core. The function starts running fine, but as the loop progresses, so does the computer's memory usage, until eventually it runs out of RAM and the process…

Dan Rosenheck
- 51
- 4
2
votes
2 answers
Produce a matrix using a foreach loop and parallel processing
I am trying to convert a for loop which I am currently using to run a process across a large matrix. The current for loop finds the maximum value within a 30 x 30 section and creates a new matrix with the maximum value.
The current code for the for…

chrischandler
- 35
- 4
2
votes
1 answer
R dopar foreach on chunks instead of per line
This question is specific to using parallel processing in R using foreach and dopar. I have created a simple dataset and a simple operation (the actual operation is more complex and hence I am presenting a simple operation here). The code for the…

Prometheus
- 673
- 3
- 25
2
votes
1 answer
R doParallel: couldn't find function
I have set up the following function:
cv_model <- function(dat, targets, predictors_name){
library(randomForest)
library(caret)
library(MLmetrics)
library(Metrics)
# set up error measures
sumfct <- function(data, lev = NULL, model =…

yPennylane
- 760
- 1
- 9
- 27
2
votes
1 answer
R - cpv (trotter package) and %dopar%
I'd like to know whether the cpv function within the trotter package works with %dopar%? I'm getting the following error:
task 1 failed - "object of type 'S4' is not subsettable"
Here's a small…

user10665650
- 23
- 2
2
votes
1 answer
R > how to conditionally append doPar loop result back to main result dataset
I am trying to implement an R parallel loop iteration, but not sure how to condition it so that it will only need to return (row-bind append) result to the main result dataset if certain condition is met. Meaning, in some situation I do not want the…

Aaron Chan
- 21
- 1
2
votes
1 answer
Run breakpoint (lm) detection in parallel in R
I am doing about 80000 time series breakpoint detection calculations in R. I have all these extremely different time series where I cannot apply ARIMA models so I am calculating a linear model per time series, then extract the breakpoints and use…

Jonathan
- 148
- 1
- 10
2
votes
0 answers
R: parallel foreach on list names does not work, but regular for loop runs successfully
I'm trying to run PCA on two large datasets derived from the same parent dataset earlier in the script. I would like to perform the PCA in parallel on each of the objects, but for some reason I can't get it to work. The code block runs successfully…

Carmen Sandoval
- 2,266
- 5
- 30
- 46
2
votes
1 answer
R: doParallel (FORK), foreach and random number generation
When running a foreach loop using the doParallel package in R and FORK, each worker will start off with the same random seed thus leading to duplicate 'random' numbers.
set.seed(1)
cl <- makeCluster(2, type =…

Scholar
- 463
- 5
- 19
2
votes
3 answers
Improve performance for computing Weighted Jaccard in a large matrix
R input: a matrix (measures x samples) (2291 x 265) (matrix [i,j]=a value between 0 and 1)
Output: a simmetric similarity matrix of the weighted jaccard computed between all the pairs of samples
Problem: find the fastest way to produce the output. I…

Luke
- 33
- 5
2
votes
0 answers
Controlling the number of CPUs used by registerDoParallel
I have recently inherited a legacy R script that at some point trains a gradient boost model with a large regression matrix. This task is parallelised using the doParallel::registerDoParallel function. Originally, the script started the parallel…

Luís de Sousa
- 5,765
- 11
- 49
- 86