Given the following data frame:
df <-
data.frame(one_letter = rep("a", 5),
other_letters = letters[2:6])
df
#> one_letter other_letters
#> 1 a b
#> 2 a c
#> 3 a d
#> 4 a e
#> 5 a f
I want to combine both columns to one, to get:
#> all_letters_combined
#> 1 a
#> 2 b
#> 3 c
#> 4 d
#> 5 e
#> 6 f
Although I could utilize dplyr
&tidyr
and do the following:
library(dplyr, warn.conflicts = FALSE)
library(tidyr)
# yes, it gets the job done
df %>%
pivot_longer(everything()) %>%
select(value) %>%
unique()
#> # A tibble: 6 x 1
#> value
#> <chr>
#> 1 a
#> 2 b
#> 3 c
#> 4 d
#> 5 e
#> 6 f
I'm nevertheless looking for a faster/more direct way to do it. This is because speed becomes an issue when our df
is a tibble with list-columns that contain dataframes. Here's an example, although still pretty minimal:
library(nycflights13)
library(babynames)
library(tictoc)
bigger_tib <-
tibble(one_df = rep(list(babynames), 10),
other_dfs = list(starwars, flights, mtcars, trees, women, PlantGrowth, ToothGrowth, co2, Titanic, USArrests))
tic()
bigger_tib %>%
pivot_longer(everything()) %>%
select(value) %>%
unique()
#> # A tibble: 11 x 1
#> value
#> <list>
#> 1 <tibble [1,924,665 x 5]>
#> 2 <tibble [87 x 14]>
#> 3 <tibble [336,776 x 19]>
#> 4 <df [32 x 11]>
#> 5 <df [31 x 3]>
#> 6 <df [15 x 2]>
#> 7 <df [30 x 2]>
#> 8 <df [60 x 3]>
#> 9 <ts [468]>
#> 10 <table [4 x 2 x 2 x 2]>
#> 11 <df [50 x 4]>
toc()
#> 0.97 sec elapsed
I know the example isn't great because it doesn't demonstrate problematic run time, but in my real data this procedure gets pretty slow and I want to speed it up.