-2

I'm working on Google Analytics Sample data, it comes straight from BigQuery.

Data Source -> https://support.google.com/analytics/answer/7586738?hl=en

The data has lots of nested columns, with type "list". How can I unnest those columns and shape the data for smooth analysis?

Using unnest() opens up the whole column in longer shape, and takes away the column names as well for the data in the list. It's hard to build my analysis and viz on data like that with no column names. Or any other way to see those columns in list and access them?

EDIT: I want to unnest the nested columns without losing their names.

A snippet of columns and code that I used

Thanks.

2 Answers2

0

I am not 100 % sure if I understood your problem. Of course, data frames with neseted columns can be unpleasant. Still it's quite an efficient way to display the data. As I am not sure which part of the data you want to analyse, or how the lists in your data.frame look like (I did not want to download the data), my answer won't be too specific.

But maybe you could reduce your data only to the variables you need for an analysis and the reshape it accordingly ... Maybe something like that ...

x <- dplyr::tibble(ID=c(1,2,3),
              VAR1=list(c(1,2,3,4),
                        c(3,3,3,3),
                        c(1,3,1,2))) 
# A tibble: 3 x 2
     ID VAR1     
  <dbl> <list>   
1     1 <dbl [4]>
2     2 <dbl [4]>
3     3 <dbl [4]>

lapply(seq(1,nrow(x)),function(idx){
    dplyr::as_tibble(x$VAR1[[idx]]) %>%
        dplyr::mutate(ID=x$ID[idx])}) %>%
    dplyr::bind_rows()

# A tibble: 12 x 2
   value    ID
   <dbl> <dbl>
 1     1     1
 2     2     1
 3     3     1
 4     4     1
 5     3     2
 6     3     2
 7     3     2
 8     3     2
 9     1     3
10     3     3
11     1     3
12     2     3

UPDATE: keep names

The solution to keep the names is quite similar to the one above. The big difference is to use dplyr::tibble instead af dplyr::as_tibble (no idea why I used the latter in the first place).

# some fake data    
x <- dplyr::tibble(ID=c(1,2,3),
                       VAR1=list(c(1,2,3,4) %>% magrittr::set_names(c("A","B","C","D")),
                                 c(3,3,3,3) %>% magrittr::set_names(c("E","F","G","H")),
                                 c(1,3,1,2) %>% magrittr::set_names(c("I","J","K","L")))) 

# unnesting variable
y <- lapply(seq_len(nrow(x)), function(idx){
  dplyr::tibble(VAR1=x$VAR1[[idx]],
                ID=x$ID[idx])}) %>%
  dplyr::bind_rows()

> y$VAR1
A B C D E F G H I J K L 
1 2 3 4 3 3 3 3 1 3 1 2 
sambold
  • 807
  • 5
  • 15
  • I want to unnest the nested columns without losing the names they have. – Rohan Kataria Aug 26 '20 at 15:31
  • I tried to update my answer and I hope it will help you to solve your problem. If it's still not what you're looking for, maybe you could provide some small example data that illustrates your problem in a little more detail? – sambold Aug 27 '20 at 06:53
0

I used the following answer -

data %>% unnest_wider(nested_column, names_sep = "_")