2

I need to flatten an arbitrarily nested list to a data frame and retain the path of keys / indices in one column, while extracting each element on the bottom level to an individual row.

Consider the following list:

lst <- list(
    animals = list(
        lamas = c("brown", "white"),
        primates = list(
            humans = c("asia", "europe"),
            apes = c("good", "fast", "angry")
        )
    ),
    objects = c("expensive", "cheap"),
    plants = NULL
)

The results of flatten_list(lst, delimiter="_") should look like this:

data.frame(
  path = c("animals_lamas", "animals_lamas", "animals_primates_humans", "animals_primates_humans", "animals_primates_apes", "animals_primates_apes", "animals_primates_apes", "objects", "objects", "plants"),
  value = c("brown", "white", "asia", "europe", "good", "fast", "angry", "expensive", "cheap", NA)
)

I was surprised that I couldn't achieve this with tidyr or data.tables. Do I need a recursive function, or is there some out-of-the-box solution for this? Appreciated!

EDIT: The solution provided by akrun worked on the original data. I realized that there is a problem when an element is NULL at the bottom level and hence rephrased the problem.

EDIT2 My current workaround is to recursively replace NULL by NA before applying akrun solution, using the function supplied here [again by akrun ;) ].

Comfort Eagle
  • 2,112
  • 2
  • 22
  • 44

2 Answers2

2

A solution that can deal with NULL, based on rrapply:

library(tidyverse)
library(rrapply)

rrapply(lst, f = \(x) if (is.null(x)) NA else x, how = "melt") %>% 
  unnest(value) %>% unite(path, L1:L3, na.rm = T)

#> # A tibble: 10 × 2
#>    path                    value    
#>    <chr>                   <chr>    
#>  1 animals_lamas           brown    
#>  2 animals_lamas           white    
#>  3 animals_primates_humans asia     
#>  4 animals_primates_humans europe   
#>  5 animals_primates_apes   good     
#>  6 animals_primates_apes   fast     
#>  7 animals_primates_apes   angry    
#>  8 objects                 expensive
#>  9 objects                 cheap    
#> 10 plants                  <NA>
PaulS
  • 21,159
  • 2
  • 9
  • 26
1

It can be done by melting into a data.frame and then unite the key columns

library(reshape2)
library(dplyr)
library(tidyr)
out2 <- melt(lst) %>% 
        unite(path, L1:L3, sep = "_", na.rm = TRUE) %>% 
        select(path, value)

-checking with OP's output

> all.equal(out, out2)
[1] TRUE

We may also do this with unlist and stack from base R

stack(unlist(lapply(lst, \(x) if(is.null(x)) NA_character_ else x)))[2:1]
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Thank you, that solves the problem with the test data. I just realized that I've tried this solution before, but it doesn't work with my actual data, since some elements at the bottom level are `NULL`. Hope you don't mind if I rephrase the problem – Comfort Eagle May 13 '22 at 21:54
  • @ComfortEagle i see your edit stating that you were able to fix it – akrun May 14 '22 at 15:58