3

I have a data.table in R that is

> data.table(value="one",list1=list(c(list("one1"),list("one2"),list(c("one3 1","one3 2")))),position=list(c(0,1,2)))
     value     list1  position
  1:   one <list[3]>     0,1,2

where the <list[3]> element is

[[1]]
[[1]][[1]]
[1] "one1"

[[1]][[2]]
[1] "one2"

[[1]][[3]]
[1] "one3 1" "one3 2"

I want to lengthen the data.table so that I have

value   list1     position
one     "one1"    0
one     "one2"    1
one     "one3 1"  2
one     "one3 2"  2

where "one3 2" corresponds to position 2. So far, all my attempts result in position 3 being listed for "one3 2". Is there a fix for this?

ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81
user321627
  • 2,350
  • 4
  • 20
  • 43

4 Answers4

4

Here's one option if this is your actual dataset, though not sure about other logic that might apply to your data:

library(data.table)
library(tidyverse)

dt <- data.table(
  value="one",
  list1=list(c(list("one1"),list("one2"),list(c("one3 1","one3 2")))),
  position=list(c(0,1,2))
  )

dt %>% 
  tidyr::unnest_longer(c(list1, position)) %>% 
  tidyr::unnest_longer(list1) %>% 
  dplyr::mutate(temp_index = readr::parse_number(list1) - 1) %>% 
  dplyr::filter(position == temp_index) %>% 
  dplyr::select(-temp_index)

#> # A tibble: 4 × 3
#>   value list1  position
#>   <chr> <chr>     <dbl>
#> 1 one   one1          0
#> 2 one   one2          1
#> 3 one   one3 1        2
#> 4 one   one3 2        2
Matt
  • 7,255
  • 2
  • 12
  • 34
  • 2
    Unnest can handle multiple columns provided each unnests to equal length, so you can simplify this solution to just two unnest calls. `dt %>% unnest_longer(c(list1, position)) %>% unnest_longer(list1)`. – Ritchie Sacramento Apr 21 '23 at 04:03
4

Using data.table

library(data.table)
 dt1[, .(value, list1 = unlist(list1),
   position = c(mapply(\(x, y) rep(x, lengths(y)), position, list1)))]

-output

    value  list1 position
1:   one   one1        0
2:   one   one2        1
3:   one one3 1        2
4:   one one3 2        2
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Thank you, I just realized my `dt1` has `list1` values which are empty lists. This then results in the `Error in rep(x, lengths(y)) : invalid 'times' argument` error. Would you have any ideas how I can account for empty lists which then result in 0 lengths? – user321627 Apr 21 '23 at 06:08
  • @user321627 do you want to remove those empty cases or return NA – akrun Apr 21 '23 at 06:43
  • @user321627 can you show an example. This example `dt2 <- data.table(value="one",list1=list(c(list("one1"),list(NULL),list(c("one3 1","one3 2")))),position=list(c(0,1,2)))` works though with the code – akrun Apr 21 '23 at 06:45
  • @user321627 can you try `yourdat[, c(.(value = value), lapply(seq_along(list1), \(i) {x <- list1[[i]]; l1 <- lengths(x); x[l1==0] <- NA_character_; data.table(list1 = unlist(x), position = rep(position[[i]], lengths(x)))}))]` – akrun Apr 21 '23 at 06:56
4

I guess you can use unnest twice like below

dt %>%
  unnest(c(list1, position)) %>%
  unnest(list1)

which gives

# A tibble: 4 × 3
  value list1  position
  <chr> <chr>     <dbl>
1 one   one1          0
2 one   one2          1
3 one   one3 1        2
4 one   one3 2        2
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81
4

Here is another funky approach relying only on data.table:

dt[, .(value, list1 = unlist(list1, recursive = FALSE), position = unlist(position))][, .(list1 = unlist(list1)), by = .(value, position)]
#>    value position  list1
#> 1:   one        0   one1
#> 2:   one        1   one2
#> 3:   one        2 one3 1
#> 4:   one        2 one3 2

or a bit simpler:

dt[, .(value, list1 = list1[[1]], position = position[[1]])][, .(list1 = unlist(list1)), by = .(value, position)]
#>    value position  list1
#> 1:   one        0   one1
#> 2:   one        1   one2
#> 3:   one        2 one3 1
#> 4:   one        2 one3 2

Joris C.
  • 5,721
  • 3
  • 12
  • 27