I have a large data frame which I want to filter and make a binary data frame for based on several conditions.
This is the original data frame:
a1 <- data.frame(
ID = c(rep("ID_1",3),rep("ID_2",3)),
gene = c("A", "D", "X","D","D","A"),
C = c("Q", "R", "S","S","R","Q"),
D = c(8, 3, 3, 4, 5, 4),
E = sample(c("silent","non-silent"),6,replace=T)
)
eg:
ID gene C D E
1 ID_1 A Q 8 non-silent
2 ID_1 D R 3 silent
3 ID_1 X S 3 silent
4 ID_2 D S 4 non-silent
5 ID_2 D R 5 silent
6 ID_2 A Q 4 non-silent
I now have made an empty data frame with the IDs as columns and genes as rows as such:
dt=as.data.frame(matrix(NA, length(c(levels(a1$gene))), length(c(levels(a1$ID)))+1))
colnames(dt)[1] <- "gene"
dt[,"gene"]=c(levels(a1$gene))
colnames(dt)[-1]=levels(a1$ID)
gene ID_1 ID_2
1 A NA NA
2 D NA NA
3 X NA NA
Now I would want to put a 1 for genes that are present for each ID and 0 for those that are not present. I would later also want to include other conditions. For example only put a 1 for non-silent in the E column. Is there an R base way to do this or with a package such as data.table or ddply?