R: Converting an Existing Function into Parallel

Question

I am working with the R programming language.

Suppose I have the following data:

set.seed(123)
n <- 100000
df <- data.frame(longitude = runif(n, -180, 180),
                 latitude = runif(n, -90, 90),
                 color = sample(c("red", "blue", "green", "orange", "purple", "yellow", "pink", "black", "white", "grey"), n, replace = TRUE))

I am trying to run the following function that identifies the convex hull of all points within the same color class. To do this, I am using the built-in chull() function within R along with the lapply function:

hulls <- lapply(unique(df$color), function(color) {
  chull(df[df$color == color, c("longitude", "latitude")])
})


hull_sfs <- lapply(seq_along(hulls), function(i) {
  st_as_sf(df[df$color == unique(df$color)[i], ][hulls[[i]], ],
           coords = c("longitude", "latitude"), crs = 4326)
})


hull_sf_combined <- do.call(rbind, hull_sfs)

st_write(hull_sf_combined, "hulls.shp")

My Question: I am trying to explore different ways to improve the efficiency of this code. For instance, I am trying to see if I can use libraries such as parallel, doSNOW, foreach and functions such as mcapply() to improve the speed of this code.

But I am not sure where to begin - can someone please show me how to do this?

Thanks!

Hi @stats_noob, I'd take a look at [How do I parallelize this lapply() function in R?](https://stackoverflow.com/questions/74706306/how-do-i-parallelize-this-lapply-function-in-r/74709053) and [mclapply() chokes when elements to be parallelized on are too big](https://stackoverflow.com/questions/75899623/mclapply-chokes-when-elements-to-be-parallelized-on-are-too-big-how-to-get-a). Hopefully those should give you a sense of how to do this, and that the overhead from passing lots of data between threads can mean it's not worth it. I suspect you will find that to be the case here. — SamR, Aug 07 '23 at 11:32

R: Converting an Existing Function into Parallel

0 Answers0