0

I'm new to programming and R.

I have a data like this in columns:

C1        C2     C3        C4          C5
Apple            Apple     Banana      Banana
Banana           Orange    Orange
Orange

I want to make a binary matrix comparing all the columns to C1 where 1 is TRUE and 0 is FALSE. I want something like this:

 C1        C2     C3        C4          C5
Apple      0      1         0           0
Banana     0      0         1           1
Orange     0      1         1           0

Does anyone know how to do this? Thank you.

2 Answers2

3

You can loop over C2-C4 and match the elements to C1, i.e.

(!is.na(sapply(dd[-1], function(i)match(dd$C1, i))))*1

#     C2 C3 C4 C5
#[1,]  0  1  0  0
#[2,]  0  0  1  1
#[3,]  0  1  1  0

Or bind them together with C1, i.e.

cbind.data.frame(C1 = dd$C1, (!is.na(sapply(dd[-1], function(i) match(dd$C1, i)))) * 1)

#      C1 C2 C3 C4 C5
#1  Apple  0  1  0  0
#2 Banana  0  0  1  1
#3 Orange  0  1  1  0
Sotos
  • 51,121
  • 6
  • 32
  • 66
1

We can use %in% :

df[-1] <- +(sapply(df[-1], `%in%`, x = df$C1))
df

#      C1 C2 C3 C4 C5
#1  Apple  0  1  0  0
#2 Banana  0  0  1  1
#3 Orange  0  1  1  0

data

df <- structure(list(C1 = structure(1:3, .Label = c("Apple", "Banana", 
"Orange"), class = "factor"), C2 = c(NA, NA, NA), C3 = structure(c(1L, 
2L, NA), .Label = c("Apple", "Orange"), class = "factor"), C4 = structure(c(1L, 
2L, NA), .Label = c("Banana", "Orange"), class = "factor"), C5 = structure(c(1L, 
NA, NA), .Label = "Banana", class = "factor")), class = "data.frame",
row.names = c(NA, -3L))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213