I have a data.frame
that looks as follows:
df <- data.frame(A = NA, B = NA, C = c("a,b,c", "c,b", "d,a"), stringsAsFactors = FALSE)
df
A B C
1 NA NA a,b,c
2 NA NA c,b
3 NA NA d,a
Columns A
and B
(and some more in my real data) are set to NA
to indicate that their entries are not necessary for the question.
The goal is it should look like:
df_goal <- data.frame(A = NA, B = NA, a = c(TRUE, FALSE, TRUE), b = c(TRUE,
TRUE, FALSE), c = c(TRUE, TRUE, FALSE), d = c(FALSE, FALSE, TRUE))
df_goal
A B a b c d
1 NA NA TRUE TRUE TRUE FALSE
2 NA NA FALSE TRUE TRUE FALSE
3 NA NA TRUE FALSE FALSE TRUE
I achieved this doing:
df <- cbind(df[, 1:2], as.data.frame(t(apply(read.table(text = df$C, sep = ",", as.is = TRUE, fill = TRUE, na.strings = "")
, 1,
FUN = function(x) sort(x, decreasing= FALSE, na.last = TRUE))), stringsAsFactors = FALSE))
df <- cbind(df[, 1:2], as.data.frame(sapply(c("a", "b", "c", "d"), function(y) {sapply(1:nrow(df), function(x) {ifelse(y %in% df[x, ], TRUE, FALSE)})})))
df
A B a b c d
1 NA NA TRUE TRUE TRUE FALSE
2 NA NA FALSE TRUE TRUE FALSE
3 NA NA TRUE FALSE FALSE TRUE
identical(df, df_goal)
# [1] TRUE
Are there more concise options to achieve what I want?
Edit after @Sonny's comment:
I also thought about a tidyr
option, but could not get there:
library(tidyr)
df %>% separate(C, c("a", "b", "c", "d"))
A B a b c d
1 NA NA a b c <NA>
2 NA NA c b <NA> <NA>
3 NA NA d a <NA> <NA>
This is still unsorted and so spread
doesn't really work..
What am I missing?