Spreading a dataframe with two grouping columns

Question

I have a data set of teachers as follows:

df <- data.frame(
  teacher = c("A", "A", "A", "A", "B", "B", "C", 'C'),
  seg = c("1", '1', "2", "2", "1", "2", "1", "2"),
  claim = c(
    "beth",
    'john',
    'john',
    'beth',
    'summer',
    'summer',
    "hannah",
    "hannah"
  )
)

I would ideally like to spread my dataset like this:

Desired output.

Any ideas for how I can use either spread or pivot_wide to achieve this? The issue is that there are two grouping variables here (teacher and segment). Some teachers may have multiple of the same segment, but some teachers don't.

Try `library(data.table); dcast(setDT(df), teacher ~ paste0("seq_", seg) + rowid(teacher))` — markus, Jan 10 '20 at 23:13
Hey I like your approach but why dont you post is as a proper answer? :) — mnist, Jan 10 '20 at 23:33

akrun · Accepted Answer · 2020-01-10T23:21:03.447

One option would be to create a sequence column grouped by 'teacher', 'seg', and then use pivot_wider

library(dplyr)
library(tidyr)
library(stringr)
df %>% 
  group_by(teacher, seg) %>%
  mutate(segN = c("", "double")[row_number()]) %>%
  ungroup %>%
  mutate(seg = str_c("seg", seg, segN)) %>%
  select(-segN) %>%
  pivot_wider(names_from = seg, values_from = claim)
# A tibble: 3 x 5
#  teacher seg1   seg1double seg2   seg2double
#   <fct>   <fct>  <fct>      <fct>  <fct>     
#1 A       beth   john       john   beth      
#2 B       summer <NA>       summer <NA>      
#3 C       hannah <NA>       hannah <NA>

It can be simplified with rowid from data.table

library(data.table)
df %>% 
  mutate(seg = str_c('seg', c('', '_double')[rowid(teacher, seg)], seg)) %>%
   pivot_wider(names_from = seg, values_from = claim)
   #or use spread
   # spread(seg, claim)
#  teacher   seg1 seg_double1   seg2 seg_double2
#1       A   beth        john   john        beth
#2       B summer        <NA> summer        <NA>
#3       C hannah        <NA> hannah        <NA>

I'm trying to make sense of this line,"mutate(segN = c("", "double")[row_number()])". I see that it populates "double" in teacher rows that have two of the same segment--but I don't know how? ! — NewBee, Jan 11 '20 at 01:26
@NewBee Here, each group have two elements, so `row_number()` returns an index 1, 2. When we do `c("", "double")`, it just uses the 1, 2 as position index for replacement — akrun, Jan 11 '20 at 17:29

score 0 · Answer 2 · answered Jan 10 '20 at 23:20

You can also use a base R way with the powerful reshape function and some minor data preparation

# find duplicate values
dups <- duplicated(df[, 1:2])
# assign new names to duplicates
df[dups, 2] <- paste0(df[dups, 2], "double")

# use base r reshape function that automatically builds suitable names
wide <- reshape(df, v.names = "claim", idvar = "teacher",
                timevar = "seg", direction = "wide", sep = "")

# change varnames to the desired output
names(wide) <- gsub("claim", "seg", names(wide))
wide

Spreading a dataframe with two grouping columns

2 Answers2