3

I am trying to split one column in a data frame in to multiple columns which hold the values from the original column as new column names. Then if there was an occurrence for that respective column in the original give it a 1 in the new column or 0 if no match. I realize this is not the best way to explain so, for example:

df <- data.frame(subject = c(1:4), Location = c('A', 'A/B', 'B/C/D', 'A/B/C/D'))  

#   subject Location  
# 1       1     A                                  
# 2       2     A/B                                   
# 3       3     B/C/D                                 
# 4       4     A/B/C/D

and would like to expand it to wide format, something such as, with 1's and 0's (or T and F):

#   subject    A  B  C  D
# 1       1    1  0  0  0
# 2       2    1  1  0  0
# 3       3    0  1  1  1
# 4       4    1  1  1  1  

I have looked into tidyr and the separate function and reshape2 and the cast function but seem to getting hung up on giving logical values. Any help on the issue would be greatly appreciated. Thank you.

Henrik
  • 65,555
  • 14
  • 143
  • 159
Brad
  • 85
  • 12

2 Answers2

5

You may try cSplit_e from package splitstackshape:

library(splitstackshape)
cSplit_e(data = df, split.col = "Location", sep = "/",
         type = "character", drop = TRUE, fill = 0)
#   subject Location_A Location_B Location_C Location_D
# 1       1          1          0          0          0
# 2       2          1          1          0          0
# 3       3          0          1          1          1
# 4       4          1          1          1          1
Henrik
  • 65,555
  • 14
  • 143
  • 159
  • Thank you, Henrik, I had not came across this package prior to this (I am relatively new to R, as I'm sure you can tell)! – Brad Jan 21 '15 at 22:41
1

You could take the following step-by-step approach.

## get the unique values after splitting
u <- unique(unlist(strsplit(as.character(df$Location), "/")))
## compare 'u' with 'Location' 
m <- vapply(u, grepl, logical(length(u)), x = df$Location)
## coerce to integer representation
m[] <- as.integer(m)
## bind 'm' to 'subject'
cbind(df["subject"], m)
#   subject A B C D
# 1       1 1 0 0 0
# 2       2 1 1 0 0
# 3       3 0 1 1 1
# 4       4 1 1 1 1
Rich Scriven
  • 97,041
  • 11
  • 181
  • 245
  • Thank you Richard, I just had to change to the length of my actual data frame I am working with and this worked, thanks for the great in site! – Brad Jan 21 '15 at 23:03