1

I have a dataframe with many vars, out of which, two variables are shown in the sample dataset test in the following code:

test <- data.frame(row_numb = c(1,  1,  1,  1,  1,  1,  1,  2,  2,  2,  3,  3,  3,  3,  3,  3,  3,  3),
                   words = c('apply','assistance','benefit','compass','medical','online','renew','meet','service','website','center','country','country','develop','highly','home','major','obtain'))

I am trying to join the words from words column, into a new dataframe fdata and columns Dictionary, grouped by row_numb and separated by , comma using below code:

fdata <- test %>% 
    select(row_numb, words) %>% 
    group_by(row_numb) %>% 
    unite(Dictionary, words, sep=",")

I couldn't get the result I was expecting:

 row_numb   Dictionary
 1          apply, assistance, benefit, compass, medical, online, renew
 2          meet, service.... and so forth

Can someone help in finding the mistake that I am doing.

alistaire
  • 42,459
  • 4
  • 77
  • 117
LeMarque
  • 733
  • 5
  • 21
  • 2
    `test %>% group_by(row_numb) %>% summarise(word = toString(words))`; `unite` is to paste together multiple columns. – alistaire Jul 21 '18 at 22:02
  • Thanks. It worked. I would request you to add some examples for both and some explanation please for the community help. – LeMarque Jul 21 '18 at 22:07

2 Answers2

1

unite is for pasting multiple columns together, not for aggregating one. For that, use summarise with paste(..., collapse = ', '), or for the particular case of a comma-separated string, toString:

library(tidyverse)

test <- data.frame(row_numb = c(1,  1,  1,  1,  1,  1,  1,  2,  2,  2,  3,  3,  3,  3,  3,  3,  3,  3),
                   words = c('apply','assistance','benefit','compass','medical','online','renew','meet','service','website','center','country','country','develop','highly','home','major','obtain'))

test %>% group_by(row_numb) %>% summarise(words = toString(words))
#> # A tibble: 3 x 2
#>   row_numb words                                                         
#>      <dbl> <chr>                                                         
#> 1        1 apply, assistance, benefit, compass, medical, online, renew   
#> 2        2 meet, service, website                                        
#> 3        3 center, country, country, develop, highly, home, major, obtain

To use unite, specify the name of the new column, and the columns that should be pasted together, optionally with a sep parameter, e.g.

iris %>% unite(sepal_l_w, Sepal.Length, Sepal.Width, sep = ' / ') %>% head()
#>   sepal_l_w Petal.Length Petal.Width Species
#> 1 5.1 / 3.5          1.4         0.2  setosa
#> 2   4.9 / 3          1.4         0.2  setosa
#> 3 4.7 / 3.2          1.3         0.2  setosa
#> 4 4.6 / 3.1          1.5         0.2  setosa
#> 5   5 / 3.6          1.4         0.2  setosa
#> 6 5.4 / 3.9          1.7         0.4  setosa
alistaire
  • 42,459
  • 4
  • 77
  • 117
1

Another general pattern that works for this kind of task is nest() and then mutate()/map(), if the particular task you need to do next doesn't have a function like toString() that fits the bill. It's still just a three-liner: first nest() your data, then flatten the list structure, then paste/collapse it together.

library(tidyverse)

test %>%
  nest(-row_numb) %>%
  mutate(Dictionary = map(data, unlist),
         Dictionary = map_chr(Dictionary, paste, collapse = ", "))

#> # A tibble: 3 x 3
#>   row_numb data           Dictionary                                      
#>      <dbl> <list>         <chr>                                           
#> 1        1 <tibble [7 × … apply, assistance, benefit, compass, medical, o…
#> 2        2 <tibble [3 × … meet, service, website                          
#> 3        3 <tibble [8 × … center, country, country, develop, highly, home…

Created on 2018-08-14 by the reprex package (v0.2.0).

Julia Silge
  • 10,848
  • 2
  • 40
  • 48