I have a function like the following:
fxn <- function(X) {
data <- replicate(10, rnorm(10000))
clusters <- kmeans(data, X)
write.csv(clusters$cluster, paste0("kmeans", X, ".csv"))}
I want to use mclapply to iterate it in parallel.
list <- list(10, 50, 100, 150, 200, 250, 300)
mclapply(list, fxn, mc.cores = 8)
This is a very simplified version of my function and use-case, but I want to use it to clarify how environments are handled when using a user-defined function and mclapply.
Because this is being processed in parallel on the same RAM, I was wondering whether the mclapply function could get "confused" at some point and mix up either data
or clusters
for a different parameter (as defined in list
)(by overwriting data
and clusters
and using the variable which was made using the wrong X
). I am aware that each function maintains its own environment, but as the same function is being used several times at once, I want to confirm how this works.
I would really appreciate it if you could clarify this for me or point me in the right direction.
Thanks!