3

I would like to transform the column word into another column, but with the original rows combined together, separed by _.

  • Original:
> head(df, 10)
# A tibble: 10 x 1
   word    
   <chr>   
 1 Jason   
 2 Oscar   
 3 Maleeka 
 4 Janet   
 5 Gabriel 
 6 Raheema 
 7 Bryce   
 8 Nasreen 
 9 Hishaam 
10 Thadduse
  • desired output:
word 
Jason_Oscar_Maleeka_Janet_Gabriel_Raheema_Bryce_Nasreen_Hishaam_Thadduse
.
.
.

Question

1 How do I pivot word so that each new row contains 10 words per row separated by "_" ? (Tidyverse and stringr approaches would be much appreciated - thanks!)

  • Data:
> dput(df)
structure(list(word = c("Jason", "Oscar", "Maleeka", "Janet", 
"Gabriel", "Raheema", "Bryce", "Nasreen", "Hishaam", "Thadduse", 
"Marcos", "Daijah", "Chassity", "Carlito", "Chidiebere", "Matthew", 
"Maureene", "Jillian", "Markus", "Aaron", "Ramziyya", "Marquez", 
"Kiera", "Farajallah", "Larisa", "Davier", "Shujaa", "Vincent", 
"Orlando", "Joseph", "Desean", "Chelsea", "Faadil", "Christopher", 
"Aarifa", "Joel", "Matthew", "Jacob", "Aeones", "Matthew", "Jacob", 
"Savannah", "Nadia", "Kaleem", "Tanner", "Sabeeha", "Caitlyn", 
"Taylor", "Sydney", "Devin")), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -50L))
Larissa Cury
  • 806
  • 2
  • 11

3 Answers3

4

What you are describing isn't a pivot exactly, but you can achieve it as follows:

library(tidyverse)

df %>%
  group_by((row_number() - 1) %/% 10) %>%
  summarise(word = paste(word, collapse = '_')) %>%
  select(word)
#> # A tibble: 5 x 1
#>   word                                                                           
#>   <chr>                                                                          
#> 1 Jason_Oscar_Maleeka_Janet_Gabriel_Raheema_Bryce_Nasreen_Hishaam_Thadduse       
#> 2 Marcos_Daijah_Chassity_Carlito_Chidiebere_Matthew_Maureene_Jillian_Markus_Aaron
#> 3 Ramziyya_Marquez_Kiera_Farajallah_Larisa_Davier_Shujaa_Vincent_Orlando_Joseph  
#> 4 Desean_Chelsea_Faadil_Christopher_Aarifa_Joel_Matthew_Jacob_Aeones_Matthew     
#> 5 Jacob_Savannah_Nadia_Kaleem_Tanner_Sabeeha_Caitlyn_Taylor_Sydney_Devin 
Allan Cameron
  • 147,086
  • 7
  • 49
  • 87
  • 1
    beat me by 52 seconds – r2evans May 11 '23 at 12:28
  • Thank you!! Would you mind explaning the logic behind it? – Larissa Cury May 13 '23 at 18:32
  • 1
    @LarissaCury sure. If you take the row numbers and subtract 1, then divide by 10, the first ten values would be 0, 0.1, 0.2, etc up to 0.9. The next ten values would be 1.0, 1.1, 1.2, etc up to 1.9. Note that the integer part of the first 10 numbers is 0, the second ten will be 1, the third 10 will be 2, etc. To divide but only get the integer part we use the modulus operator `%/%`, so `(row_number() -1) %/% 10` gives us this sequence of integers. Now we `group_by` these numbers to group into clumps of 10. Finally, we `paste` the clumps together using `"_"` as a separator. – Allan Cameron May 13 '23 at 18:47
4

Here is a base R option with matrix approach

data.frame(
    word =
        do.call(
            paste,
            c(as.data.frame(matrix(unlist(df), ncol = 10, byrow = TRUE)),
                sep = "_"
            )
        )
)

which gives

                                                                             word
1        Jason_Oscar_Maleeka_Janet_Gabriel_Raheema_Bryce_Nasreen_Hishaam_Thadduse
2 Marcos_Daijah_Chassity_Carlito_Chidiebere_Matthew_Maureene_Jillian_Markus_Aaron
3   Ramziyya_Marquez_Kiera_Farajallah_Larisa_Davier_Shujaa_Vincent_Orlando_Joseph
4      Desean_Chelsea_Faadil_Christopher_Aarifa_Joel_Matthew_Jacob_Aeones_Matthew
5          Jacob_Savannah_Nadia_Kaleem_Tanner_Sabeeha_Caitlyn_Taylor_Sydney_Devin
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81
3

Another way to do it with tidyverse, with real pivoting.

library(tidyverse)

df %>% 
  mutate(rn = rep(1:10, nrow(.) %/% 10),
         group = rep(1:5, each = nrow(.) %/% 5)) %>% 
  pivot_wider(names_from = rn, values_from = word) %>% 
  select(-group) %>% 
  unite(col = word, sep = "_")

The following modified version is ugly, but is more dynamic (i.e. you only need to change the value to word_number without the need to modify the piped operation).

word_number <- 10
set_number <- nrow(df) %/% word_number

df %>% 
  mutate(rn = rep(1:word_number, set_number),
         group = rep(1:set_number, each = nrow(.) %/% set_number)) %>% 
  pivot_wider(names_from = rn, values_from = word) %>% 
  select(-group) %>% 
  unite(col = word, sep = "_")

Output

# A tibble: 5 × 1
  word                                                                           
  <chr>                                                                          
1 Jason_Oscar_Maleeka_Janet_Gabriel_Raheema_Bryce_Nasreen_Hishaam_Thadduse       
2 Marcos_Daijah_Chassity_Carlito_Chidiebere_Matthew_Maureene_Jillian_Markus_Aaron
3 Ramziyya_Marquez_Kiera_Farajallah_Larisa_Davier_Shujaa_Vincent_Orlando_Joseph  
4 Desean_Chelsea_Faadil_Christopher_Aarifa_Joel_Matthew_Jacob_Aeones_Matthew     
5 Jacob_Savannah_Nadia_Kaleem_Tanner_Sabeeha_Caitlyn_Taylor_Sydney_Devin  
benson23
  • 16,369
  • 9
  • 19
  • 38