How can I best flatten a nested list to a data.frame in R?

Question

I would like to convert information from a rest api, which comes in form of json, to a data.frame. The list is nested and theoretically I could repeatedly call purrr::flatten() to get to the bottom of the list and then extract the information using for example purrr:::map_dfr and magrittr:::extract. However, this is very domain specific and doesn't work nicely when extracting information from multiple "hierarchies". I have the following set-up in R:

library(rjson)

url <- "https://api3.geo.admin.ch/rest/services/api/SearchServer?searchText=Avenue de Lavaux 63, 1009 Pully&origins=address&type=locations"
result <- rjson::fromJSON(file = URLencode(url))

Two questions arise:

How can I nicely extract attributes like detail, x and y and write them to a data.frame?
On top of that, how can I directly extract values by their names. That is how to extract the values for weight, x, y, and detail.

Thank you very much.

@ RAB Yes, sorry for that. I corrected it. – Patrick Balada Jul 31 '19 at 06:51 — Patrick Balada, Jul 31 '19 at 06:51

morgan121 · Accepted Answer · 2019-07-31T07:01:08.533

You can unlist the result and extract x and y like this:

res <- unlist(result)
res['results.attrs.x']
# results.attrs.x 
#  "151398.09375"

res['results.attrs.y']
# results.attrs.y 
#  "540429.3125"

You can get the names of all other values like this:

names(res)
#[1] "results.id"  "results.weight"  "results.attrs.origin"         
#    "results.attrs.geom_quadindex" "results.attrs.zoomlevel"     
#[6] "results.attrs.featureId" "results.attrs.lon" "results.attrs.detail"   
#    "results.attrs.rank" "results.attrs.geom_st_box2d" "results.attrs.lat"
#    "results.attrs.num" "results.attrs.y" "results.attrs.x"  "results.attrs.label"

Then you can combine them in a dataframe:

res_df <- data.frame(
  X = res['results.attrs.x'],
  Y = res['results.attrs.y']
)

score 1 · Answer 2 · answered Jul 31 '19 at 07:12

I would unlist it too. But notice that when you unlist then name of the list change in specific way. For example, result$results[[1]]$id becomes results.id , result$results[[1]]$weight becomes results.weight. We can use this property to define keys of interest and extract them in separate list using lapply. We can then convert the list into separate dataframe.

result1 <- unlist(result)
keys <- c("detail", "x", "y", "weight")

df1 <- as.data.frame(lapply(keys, function(x) 
          unname(result1[grepl(paste0("\\.", x), names(result1))])))
names(df1) <- values

df1
#                                           detail            x           y weight
#1 avenue de lavaux 63 1009 pully 5590 pully ch vd 151398.09375 540429.3125      7

@ Ronak Shah Thank you very much for your reply. That was one concern I had (and didn't specifically mention). I will gladly use your approach. — Patrick Balada, Jul 31 '19 at 07:15

score 1 · Answer 3 · answered Jul 31 '19 at 07:16

It might be easier, and more useful, to extract all of the data. This gets around the problem of the long variable names by simply extracting the names to a new data frame variable, spreading the attributes using these names, and then unnesting the values:

library(tidyverse)

as_tibble(result$results[[1]]) %>% 
    mutate(attr_names = names(attrs)) %>% 
    spread(attr_names, attrs) %>% 
    unnest()

Which will return a data frame like the following:

# A tibble: 1 x 15
       id weight detail  featureId geom_quadindex geom_st_box2d  label   lat   lon   num origin  rank      x      y zoomlevel
    <dbl>  <dbl> <chr>   <chr>     <chr>          <chr>          <chr> <dbl> <dbl> <dbl> <chr>  <dbl>  <dbl>  <dbl>     <dbl>
1 2172570      7 avenue… 785542_0  0212222220211… BOX(540429.29… Aven…  46.5  6.66    63 addre…     7 1.51e5 5.40e5        10

Thank you gersht. That's a beautiful solution and even better since I prefer to use dplyr. — Patrick Balada, Jul 31 '19 at 07:21

score 0 · Answer 4 · answered Jul 31 '19 at 14:41

0

Based on and inspired from your answers, I use the following pipe now

library(tidyverse)

result %>%
   pluck("results", 1) %>%
   flatten() %>% 
   as_tibble() %>% 
   select(id, weight, detail, x, y)

answered Jul 31 '19 at 14:41

Patrick Balada

1,330
1
18
37

Selsheikh · Answer 5 · 2022-02-16T16:39:19.937

Edit: not a direct answer to the question, but I think it is a generally helpfull approach.

I was researching this question the last couple of days. I think the best solution is the unlist(). This is not dependant on the nesting and preserves names.

library(tidyverse)
result %>% unlist() %>% 
tibble(names = names(.), values = .) %>% # take names as column in tibble
mutate(
    n_seps = str_count(names, "\\.") # count separators to separate to unknown number of columns
) %>%
separate(names,str_c(
        "col_",
        c(1:(max(.$n_seps) +1)
        )), sep = "\\."
        ) %>% 
select(-n_seps)

Only caveat is that if the column names already contain ".", this will add a nesting level per ".". I found a solution to rename lists recursively: https://stackoverflow.com/a/63075776/14604591.

output:

result <- rjson::fromJSON(file = "c:/Users/elshikh/Downloads/SearchServer.json")


mydailyhelpers::tidy_nested_lists(result)
#> Warning: Expected 3 pieces. Missing pieces filled with `NA` in 2 rows [1, 2].
#> # A tibble: 15 x 4
#>    col_1   col_2  col_3          values                                         
#>    <chr>   <chr>  <chr>          <chr>                                          
#>  1 results id     <NA>           2692129                                        
#>  2 results weight <NA>           1                                              
#>  3 results attrs  origin         address                                        
#>  4 results attrs  geom_quadindex 021222222021110121000                          
#>  5 results attrs  zoomlevel      10                                             
#>  6 results attrs  featureId      785542_0                                       
#>  7 results attrs  lon            6.66245317459106                               
#>  8 results attrs  detail         avenue de lavaux 63 1009 pully 5590 pully ch vd
#>  9 results attrs  rank           7                                              
#> 10 results attrs  geom_st_box2d  BOX(540428.897 151398.511999999,540428.897 151~
#> 11 results attrs  lat            46.5112342834473                               
#> 12 results attrs  num            63                                             
#> 13 results attrs  y              540428.875                                     
#> 14 results attrs  x              151398.515625                                  
#> 15 results attrs  label          Avenue de Lavaux 63 <b>1009 Pully</b>

^{Created on 2022-02-16 by the reprex package (v2.0.1)}

How can I best flatten a nested list to a data.frame in R?

5 Answers5

Linked