1

I have a data frame where I would like to perform multiple operations with. Here I give you an example to illustrate it, for example to create a list of plots:

library(tidyverse)

plot_fun = function(data, geom) {

  plot = ggplot(data, aes(x = factor(0), y = Sepal.Length))

  if (geom == 'bar') {
    plot = plot + geom_col()
  } else if (geom == 'box') {
    plot = plot + geom_boxplot()
  }

  plot +
    labs(x = unique(data$Species)) +
    theme_bw() +
    theme(axis.text.x = element_blank())

}

As you can see, this function takes a data frame, and perform two types of plots depending the geom parameter.

In my real problem, I have to split the data frame by one or multiple factors, and do the job. Do not take care about this specific example (I know I can put iris$Species on x-axis)

iris_ls = split(iris, iris$Species)
geom_ls = c('bar', 'box')

lapply(geom_ls, function(g) {
  lapply(iris_ls, function(x) {
    plot_fun(x, g)
  })
})

My problem is due if I want to create both types of plots, I have to write a nested lapply (bad performance for parallelization cases).

So my question is, how should I avoid nested lapply procedure? Should I multiplicate length of iris_ls by the length of geom_ls vector? I do not know how to asses this. Imagine I have multiple geom like parameters in my function.

PS: Using drop = TRUE on split function, does not drop factor levels for each element of the list, I don't not know if it's the correct way to do it. I have to use another lapply to do it

Waldi
  • 39,242
  • 6
  • 30
  • 78
Archymedes
  • 431
  • 4
  • 15
  • If you want to create both plots for each data.frame, why not adapt your function so that it creates both plots by default? Then you would only need 1 `lapply` call. – milanmlft Jun 09 '20 at 12:12
  • I know I coud use %in% replacing ==, and add result(list(plot_bar, plot_box), but in my real example, it's so much complicated (not plotting related in all cases) and it's not possible. I'm takintg a look to utils::combn to combine all possible combinations of dataframe with geom like parameters.... – Archymedes Jun 09 '20 at 12:17
  • You can simply add `data <- droplevels(data)` inside `plot_fun` can't you? Why do you think nested `lapply` will slow you down? Unless your lists are very long most of your processing time will be in plotting won't it? – Chuck P Jun 09 '20 at 12:20
  • Thanks Chuck P, but it's related about another question I did https://stackoverflow.com/questions/60581419/how-to-write-efficient-nested-functions-for-parallelization. About drop argument, yes, it's the correct way to do it, sorry, another lapply it's not necessary, but I do not know the purpose of that argument in split function then – Archymedes Jun 09 '20 at 12:22

1 Answers1

3

Use the purrr package :

cross_ls  <- purrr::cross(list(iris = split(iris, iris$Species),
                               geom = list('bar', 'box')))

cross_ls %>% purrr::map(~{plot_fun(.x$iris,.x$geom)})

or in its parallel version :

library(furrr)
plan(multiprocess)

cross_ls %>% furrr::future_map(~{plot_fun(.x$iris,.x$geom)})
Waldi
  • 39,242
  • 6
  • 30
  • 78