0

I have a data frame where a variable is a list of other variables.

data <- data.frame(
  id = c(10, 20, 30, 40),
  x = c("a,b", "a,d", "b,c", "d,b")
)
data <- data %>% mutate(x = strsplit(x, ",", fixed = TRUE))

  id   x
1 10 a,b
2 20 a,d
3 30 b,c
4 40 d,b

I want an easy way to create dummy variables based off of whether the variable is in the list or not, and put those results inside the original data frame.

For example,

  id a b c d
1 10 1 1 0 0
2 20 1 0 0 1
3 30 0 1 1 0
4 40 0 1 0 1

This is very similar to the question here, but I can't find a way to get any suggested method to work with data frames. I've looked at the dummies package, qdapTools, fastDummies, but nothing seems to have what I'm looking for.

Thank you so much for any help!

Gus Beringer
  • 172
  • 9
  • 1
    Don't run the `strsplit` step. Run this directly on `data`. `result <- splitstackshape::cSplit_e(data, "x", type = 'character', fill = 0)` – Ronak Shah Nov 26 '20 at 04:50

1 Answers1

0

Does this work:

library(dplyr)
library(tidyr)
data %>% separate_rows(x, sep = ',') %>% mutate(val = 1) %>% 
+   pivot_wider(names_from = x, values_from = val, values_fill = list(val = 0)) %>% 
+   select(1,2,3,5,4)
# A tibble: 4 x 5
     id     a     b     c     d
  <dbl> <dbl> <dbl> <dbl> <dbl>
1    10     1     1     0     0
2    20     1     0     0     1
3    30     0     1     1     0
4    40     0     1     0     1
Karthik S
  • 11,348
  • 2
  • 11
  • 25