1

I'm looking to dummify a character column. Suppose you have a data frame like this one:

test_data <- data.frame("Emotion" = c("Happy", 
"Happy, Sad", 
"Sad, Angry, Nervous", 
"Happy, Nervous", 
"Happy", "Angry", 
"Sad, Angry", "Happy", 
"Happy, Sad, Angry, Nervous", 
"Angry, Nervous", 
"Sad, Nervous", 
"Sad, Angry, Nervous", 
"Happy, Angry", 
"Happy, Angry, Nervous", 
"Sad, Angry, Nervous", 
"Happy, Sad, Angry, Nervous", 
"Angry, Nervous"))

And I want to turn it to this:

Happy Sad Angry Nervous
1     0   0     0
1     1   0     0
0     1   1     1
1     0   0     1
1     0   0     0
0     0   1     0
1     1   1     0
1     0   0     0
1     1   1     1
0     0   1     1
0     1   0     1
0     1   1     1
1     0   1     1
1     0   1     1
0     1   1     1
1     1   1     1
0     0   1     1

My previous post got closed and I was pointed to this post. However, the answers in that post don't work for me as they seem to presume that only two emotions appear in each row.

For example, in the third row we have an entry "Sad, Angry, Nervous" which is split into dummy variables "Sad" and "Angry, Nervous" instead of being split into dummy variables "Sad", "Angry" and "Nervous".

Also, some of the answers in that post which use tidyverse packages seem to create additional rows in my data which I don't need. I need to keep the same number of rows and just create additional columns with dummy variables. Any help will be greatly appreciated.

Sotos
  • 51,121
  • 6
  • 32
  • 66
J. Doe
  • 1,544
  • 1
  • 11
  • 26
  • That previous post was closed with the right dupe. I tried it before closing and it works great! Try a random answer from that post...`x <- strsplit(as.character(test_data$Emotion), ",\\s?") # split the strings lvl <- unique(unlist(x)) # get unique elements x <- lapply(x, factor, levels = lvl) # convert to factor t(sapply(x, table))`. The only thing you need to change is the delimiter for the split... – Sotos Nov 01 '19 at 07:44
  • That is incorrect. Using the first answer from that question, for example, I get the following dummy variables: - Happy - Happy, Sad - Sad, Angry, Nervous - Happy, Nervous - Angry - Sad, Angry - Happy, Sad, Angry, Nervous - Angry, Nervous - Sad, Nervous - Happy, Angry - Happy, Angry, Nervous This is exactly an example of what I don't need. I need just 4 dummy variables: Happy, Sad, Angry and Nervous. – J. Doe Nov 01 '19 at 07:49
  • 1
    How are you using it? It works for me! Did you change the delimiter from `;` to `, `? Copy the code I commented and try that. You are simply missing something and I am sure it's the delimiter (separator) – Sotos Nov 01 '19 at 07:50

0 Answers0