Questions tagged [mclapply]

mclapply is a parallelized version of lapply, it returns a list of the same length as X, each element of which is the result of applying FUN to the corresponding element of X.

mclapply is a parallelized version of lapply. It returns a list of the same length as X, each element of which is the result of applying FUN to the corresponding element of X.

136 questions
2
votes
1 answer

mclapply cores spending lots of time in uninterruptable sleep

This is a somewhat generic question for which I apologize, but I can't generate a code example that reproduces the behavior. My question is this: I'm scoring a largish data set (~11 million rows with 274 dimensions) by subdividing the data set into…
TomR
  • 546
  • 8
  • 19
2
votes
0 answers

Modify variables outside function using mclapply

It is easy to modify variables outside a function using assign() or <<-, even if the function is called with lapply(). But these tricks seem not working while calling function with mclapply(), the parallel version of lapply() in package…
Ali
  • 9,440
  • 12
  • 62
  • 92
2
votes
1 answer

Is there a faster way to apply logical operations to subset a large dataset in R?

first post on StackOverflow, so be gentle if I don't get the etiquette quite right. I have a big data frame (well, seven of them actually, but that isn't important) containing hands drawn from a deck of cards. I have another array that goes with it,…
Bill Beesley
  • 118
  • 4
2
votes
2 answers

mclapply vs for loops for plotting: speed and scalability focus

I am running a function in R that can take a long time to run as it carries out multiple commands to transform and subset some data before it pushes it into ggplot to plot. I need to run this function multiple times adjusting the arguments values.…
h.l.m
  • 13,015
  • 22
  • 82
  • 169
1
vote
0 answers

R mclapply return serialization object size limit using version 4.3.0 on Apple M1 aarch64-apple-darwin20

The mclapply help page says: "Prior to R 3.4.0 and on a 32-bit platform, the serialized result from each forked process is limited to 2^31 - 1 bytes." I have this problem even with version 4.3.0 on a 64 bit Apple M1 Pro laptop using a 64-bit build…
H. Johnson
  • 11
  • 2
1
vote
0 answers

how to make mclapply in Rscript maximize use of all available linux cores?

I'm reading in a parquet file with ~1 million rows, wrangling each row, and writing out csvs. The data wrangling itself is quite simple: I select all rows of a UserID (of which there are several for each UserID in random order within the dataframe)…
1
vote
2 answers

In Parallel processing, select all the rows which contains a specific keyword in r

my data (df) contains ~2,0000K rows and ~5K unique names. For each unique name, I want to select all the rows from df which contains that specific name. For example, the data frame df looks as follows: id names 1 A,B,D 2 A,B 3 A 4 B,D 5 …
user3642360
  • 762
  • 10
  • 23
1
vote
1 answer

Parallel processing in R with "parallel" package - unpredictable runtime

I've been learning to parallelize code in R using the parallel package, and specifically, the mclapply() function with 14 cores. Something I noticed, just from a few runs of code, is that repeat calls of mclapply() (with the same arguments and same…
bob
  • 610
  • 5
  • 23
1
vote
1 answer

how to control potential fork bomb caused by mclapply, tried ulimit but didn't work

I am using mclapply in my R script for parallel computing. It saves overall memory usage and it is fast so I want to keep it in my script. However, one thing I noticed is that the number of child processes generated during running the script is more…
Yida Zhang
  • 13
  • 2
1
vote
1 answer

Unable to parallelize a function with multiple arguments in R

I tried to parallelize a simple function which adds two numbers and prints the result in R using mclapply in the library parallel. This is my code : library(doParallel) t = list(list(1,1),list(2,2),list(3,3)) f <- function (a,b){ print(a +…
Roshin Raphel
  • 2,612
  • 4
  • 22
  • 40
1
vote
1 answer

R: loading multiple RData with mclapply doesn't work

I wanted to load multiple RData in one command, as explained by Johua using > lapply(c(a_data, b_data, c_data, d_data), load, .GlobalEnv) [[1]] [1] "nRTC_Data" [[2]] [1] "RTA_Data" [[3]] [1] "RTC_Data" [[4]] [1] "RTA_Data" > rm(a_data, b_data,…
1
vote
1 answer

Load different workspaces with the same variable names without overwriting existing objects

I have a pipeline that requires loading several .RData files. However, these files all contain the same variable names (say, ls() = c(df1, df2)), and since these files are big, I decided to use mclapply(c(a.RData, b.RData, c.RData), load,…
1
vote
1 answer

Why don't parallel jobs print in RStudio?

Why do scripts parallelized with mclapply print on a cluster but not in RStudio? Just asking out of curiosity. mclapply(1:10, function(x) { print("Hello!") return(TRUE) }, mc.cores = 2) # Hello prints in slurm but not RStudio
Jeff Bezos
  • 1,929
  • 13
  • 23
1
vote
1 answer

mclapply with points when plotting in R

According to R documentation mclapply() is the parallelized version of lapply(), but in this easy example mclapply() does not work when trying to use with points(). Any solution? plot(c(0,3),c(0,1000), type='n') x<-runif(100,0,1000);…
1
vote
1 answer

How to write efficient nested functions for parallelization?

I have a dataframe with two grouping variables class and group. For each class, I have a plotting task per group. Mostly, I have 2 levels per class and 500 levels per group. I'm using parallel package for parallelization and mclapply function for…
Archymedes
  • 431
  • 4
  • 15