R: Passing different-lengthed inputs to purrr with nested data structures

Question

I have two lists foo and bar, where length(foo) > length(bar). I want to apply a function length(bar) times to each element of bar and store the output of each application in is own list and store all of the applications of the function to each element of bar in their own lists. This nested list output structure is important as I am passing it to functions that require nested lists. Examples of what I have and desired are in the minimal example.

While this works nicely with nested for-loops, I've been trying to accomplish this with purrr's map functions. I've managed to do this by creating (a) a list of length(bar) where each element is foo, (b) passing this new list and bar to an anonymous function in purrr::pmap(), and then (c) passing this to an anonymous function in purrr:map().

While this works, it seems very antithetical to the purpose of purrr:

Instead of .x, .y, and ~ syntax, I'm defining anonymous functions.
Instead of passing my raw lists (which vary in length), I'm converting one to a nested list to match the other's length. This can be memory-intensive, slow, etc.
I'm working with nested lists rather than flatter lists/dataframes I then partition into my desired data structure.

Is there an alternative way of working with different-length lists in purrr than my aproach? How might I modify my code (minimal example below) to better-exploit the syntax of purrr? One idea (Handling vectors of different lengths in purrr) is to use cross() or some equivalent to generate a single object for passing to pmap(), but I'm not how how to then generate the nested list structure.

library(purrr)

# Example data: 2 different-length lists
foo <- list(1, 2, 3)
bar <- list("df1", "df2")

# Desired output:
out <- list(list("df1_1", "df1_2", "df1_3"),
            list("df2_1", "df2_2", "df2_3"))

# Distinctive features of output:
#length(out) == length(bar)
#length(out[[1]]) == length(out[[2]])
#length(out[[1]]) == length(foo)

# Can use purrr::pmap but this will concurrently
# iterate through each element of inputs ("in
# parallel") so need to create same-length inputs
foo_list <- rep(list(foo), 2)

# Pass our inputs to pmap then use map to iterate
# over foo contained in each foo_list element.
purrr::pmap(list(foo_list, bar),
            function(foo, bar) {
              map(foo, function(i) {
                paste0(bar, "_", i)
              })
            })

Can't you unlist and do this more easily – akrun May 15 '21 at 00:15 — akrun, May 15 '21 at 00:15

akrun · Accepted Answer · 2021-05-15T00:21:33.977

1

Consider using a nested map. Loop over the 'bar' list, then do the loop over the 'foo' and paste. This will return a nested list as in the OP's expected

library(purrr)
out2 <- map(bar, ~ map(foo, function(y) paste0(.x, '_', y)))
identical(out, out2)
#[1] TRUE

Equivalent option in base R is

lapply(bar, function(x) lapply(foo, function(y) paste0(x, '_', y)))

Or with base R, we could use outer, to create a matrix of strings, then split by row (asplit with MARGIN as 1), into a list of vectors, loop over the list and convert each element of the vector to a list element with as.list

out3 <- lapply(asplit(outer(bar, paste0('_', foo), FUN = paste0), 1), as.list)
identical(out, out3)
#[1] TRUE

edited May 15 '21 at 00:21

answered May 15 '21 at 00:09

akrun

874,273
37
540
662

1

Excellent. The confusing part to me is whether .x refers to the elements of `foo` or the elements of `bar`. Seems somewhat ambiguous. – socialscientist May 15 '21 at 00:21
1

@user3614648 `.x` always returns the elements of the current call. When we have multiple calls nested, it is better to make use of traditional anonymous function where we have the flexibility to name it as `x` or `y` or any other name to differentiate – akrun May 15 '21 at 00:22

R: Passing different-lengthed inputs to purrr with nested data structures

1 Answers1