0

I have a data set of teachers as follows:

df <- data.frame(
  teacher = c("A", "A", "A", "A", "B", "B", "C", 'C'),
  seg = c("1", '1', "2", "2", "1", "2", "1", "2"),
  claim = c(
    "beth",
    'john',
    'john',
    'beth',
    'summer',
    'summer',
    "hannah",
    "hannah"
  )
)

I would ideally like to spread my dataset like this:

Desired output.

enter image description here

Any ideas for how I can use either spread or pivot_wide to achieve this? The issue is that there are two grouping variables here (teacher and segment). Some teachers may have multiple of the same segment, but some teachers don't.

mnist
  • 6,571
  • 1
  • 18
  • 41
NewBee
  • 990
  • 1
  • 7
  • 26

2 Answers2

1

One option would be to create a sequence column grouped by 'teacher', 'seg', and then use pivot_wider

library(dplyr)
library(tidyr)
library(stringr)
df %>% 
  group_by(teacher, seg) %>%
  mutate(segN = c("", "double")[row_number()]) %>%
  ungroup %>%
  mutate(seg = str_c("seg", seg, segN)) %>%
  select(-segN) %>%
  pivot_wider(names_from = seg, values_from = claim)
# A tibble: 3 x 5
#  teacher seg1   seg1double seg2   seg2double
#   <fct>   <fct>  <fct>      <fct>  <fct>     
#1 A       beth   john       john   beth      
#2 B       summer <NA>       summer <NA>      
#3 C       hannah <NA>       hannah <NA>    

It can be simplified with rowid from data.table

library(data.table)
df %>% 
  mutate(seg = str_c('seg', c('', '_double')[rowid(teacher, seg)], seg)) %>%
   pivot_wider(names_from = seg, values_from = claim)
   #or use spread
   # spread(seg, claim)
#  teacher   seg1 seg_double1   seg2 seg_double2
#1       A   beth        john   john        beth
#2       B summer        <NA> summer        <NA>
#3       C hannah        <NA> hannah        <NA>
akrun
  • 874,273
  • 37
  • 540
  • 662
  • I'm trying to make sense of this line,"mutate(segN = c("", "double")[row_number()])". I see that it populates "double" in teacher rows that have two of the same segment--but I don't know how? ! – NewBee Jan 11 '20 at 01:26
  • 1
    @NewBee Here, each group have two elements, so `row_number()` returns an index 1, 2. When we do `c("", "double")`, it just uses the 1, 2 as position index for replacement – akrun Jan 11 '20 at 17:29
0

You can also use a base R way with the powerful reshape function and some minor data preparation

# find duplicate values
dups <- duplicated(df[, 1:2])
# assign new names to duplicates
df[dups, 2] <- paste0(df[dups, 2], "double")

# use base r reshape function that automatically builds suitable names
wide <- reshape(df, v.names = "claim", idvar = "teacher",
                timevar = "seg", direction = "wide", sep = "")

# change varnames to the desired output
names(wide) <- gsub("claim", "seg", names(wide))
wide
mnist
  • 6,571
  • 1
  • 18
  • 41