Consider a data-frame df1
similar to the one shown
ID EDUCATION OCCUPATION BINARY_VAR
1 Undergrad Student 1
2 Grad Business Owner 1
3 Undergrad Unemployed 0
4 PhD Other 1
You may create your own random df1
using R-code below
ID <- c(1:4)
EDUCATION <- sample (c('Undergrad', 'Grad', 'PhD'), 4, rep = TRUE)
OCCUPATION <- sample (c('Student', 'Business Owner', 'Unemployed', 'Other'), 4, rep = FALSE)
BINARY_VAR <- sample(c(0, 1), 4, rep = TRUE)
df1 <- data.frame(ID, EDUCATION, OCCUPATION, BINARY_VAR)
# Convert to factor
df1[, names(df1)] <- lapply(df1[, names(df1)] , factor)
From this, I need to derive another data-frame df2
that would look like this
ID Undergrad Grad PhD Student Business Owner Unemployed Other BINARY_VAR
1 1 0 0 1 0 0 0 1
2 1 1 0 0 1 0 0 1
3 1 0 0 0 0 1 0 0
4 1 1 1 0 0 0 1 1
You must have noticed how for level PhD
, the other factor levels under EDUCATION
also hold true since EDUCATION
is the highest education level for that ID
. That, however, is the secondary objective.
I can't seem to figure out a way to obtain a data-frame with each column giving the truth value corresponding to individual factor levels in its parent data-frame. Is there a package in R that could help? Or maybe a way to code this?
Can I do this using melt
?
I read through previously asked question(s) that looked similar, but they deal with frequencies of occurrence.
Edit:
As recommended by Sumedh, one way to do this is using dummyVars
from caret
.
dummies <- dummyVars(ID ~ ., data = df1)
df2 <- data.frame(predict(dummies, newdata = df1))
df2 <- df2 [1:7]