0

So i have a data set i am trying to manipulate and i cant seem to find the right way to do this. Iv looked into using dcast and spread but not sure how to get the right manipulation.

so i have something like:

ID var1 var2 var3 category
--------------------------
1  x    x    x     a
1  x    x    x     b
1  x    x    x     b
2  y    y    y     a
2  y    y    y     b
2  y    y    y     c
3  z    z    z     b 
3  z    z    z     b
3  z    z    z     c

Id like it to look like this:

ID var1 var2 var3  a  b  c 
--------------------------------
1  x    x    x     1  1  0 
2  y    y    y     1  1  1
3  z    z    z     0  1  1  

Easy example data

ID <- c(1,1,1,2,2,2,3,3,3)
var1 <- c('x','x','x','y','y','y','z','z','z')
var2 <- c('x','x','x','y','y','y','z','z','z')
var3 <- c('x','x','x','y','y','y','z','z','z')
category <- c('a','b','b','a','b','c','b','b','c')

dat <- data.frame(ID,var1,var2,var3,category)
Clinton Woods
  • 249
  • 1
  • 2
  • 11

2 Answers2

1
ID <- c(1,1,1,2,2,2,3,3,3)
var1 <- c("x","x","x","y","y","y","z","z","z")
var2 <- c("x","x","x","y","y","y","z","z","z")
var3 <- c("x","x","x","y","y","y","z","z","z")
category <- c("a","b","b","a","b","c","b","b","c")

dat <- data.frame(ID,var1,var2,var3,category)

library(tidyr)
library(dplyr)

dat %>%
  distinct() %>%                   # get distinct rows
  mutate(value = 1) %>%            # create a counter
  spread(category, value, fill=0)  # reshape dataset

#   ID var1 var2 var3 a b c
# 1  1    x    x    x 1 1 0
# 2  2    y    y    y 1 1 1
# 3  3    z    z    z 0 1 1
AntoniosK
  • 15,991
  • 2
  • 19
  • 32
0

As the question is tagged with dcast, I feel obliged to post a concise solution using dcast().

The OP has not explained how the columns in the wide format should be computed. From the expected result it seems that the OP is not interested in counting the number of occurrences but to indicate presence or absence of each unique combination (1/0 in place of TRUE/FALSE).

Therefore, only unique rows are included in the reshape operation. length() is still used as aggregation function because it fills empty cells with 0 as requested.

library(reshape2)
dcast(unique(dat), ... ~ category, length)
  ID var1 var2 var3 a b c
1  1    x    x    x 1 1 0
2  2    y    y    y 1 1 1
3  3    z    z    z 0 1 1
Uwe
  • 41,420
  • 11
  • 90
  • 134