0

I've got data set showing each persons' candy preferences. Each person has multiple rows to display all of the candy that they like, like shown below.

Name      ID      Candy
Alex      101     Kit Kat
Alex      101     Twix
Sam       102     Reeses
Charlie   103     Skittles
Charlie   103     Twix
Charlie   103     Kit Kat
Charlie   103     Hershey
Nia       104     M&M
Nia       104     Skittles
Nia       104     Reeses
...

I want to convert this to a person-level data set where the Candy columns are limited to the first 3 listed.

Name      ID     Candy 1    Candy 2    Candy 3
Alex      101    Kit Kat    Twix       NA
Sam       102    Reeses     NA         NA
Charlie   103    Skittles   Twix       Kit Kat
Nia       104    M&M        Skittles   Reeses

I originally tried to use the Spread function, but I believe it doesn't work because the number of rows per person is variable.

Is there a way to achieve this in R? I'm not quite sure how to approach this problem.

alechengg
  • 3
  • 2
  • Group the data, take the first 3 rows per group, add a column that counts rows within groups, and then reshape – camille Feb 12 '20 at 21:50

1 Answers1

0

You can summarise() the data by ID and Name, use toString() to collapse the candy variable then separate it into individual columns:

library(tidyr)
library(dplyr)

df %>%
  group_by(ID, Name) %>%
  summarise(Candy = toString(Candy)) %>%
  separate(Candy, into = paste0("Candy", 1:3), sep = ", ", fill = "right", extra = "drop")

# A tibble: 4 x 5
# Groups:   ID [4]
     ID Name    Candy1   Candy2   Candy3 
  <int> <chr>   <chr>    <chr>    <chr>  
1   101 Alex    Kit Kat  Twix     NA     
2   102 Sam     Reeses   NA       NA     
3   103 Charlie Skittles Twix     Kit Kat
4   104 Nia     M&M      Skittles Reeses  
Ritchie Sacramento
  • 29,890
  • 4
  • 48
  • 56