How to revert one-hot encoded variable back into single column?

Question

I have a dataset:

data$a <- c(1,0,0,1,0)
data$b <- c(0,1,1,0,0)
data$c <- c(0,0,0,0,1)

How would I turn this into a single catergorical column that looks like this:

data$tranformed <- c(A,B,B,A,C)

989 · Accepted Answer · 2016-10-18T12:17:53.783

You could do this:

w <- which(data==1, arr.ind = T)
data$tranformed <- toupper(names(data)[w[order(w[,1]),2]])

#  a b c tranformed
#1 1 0 0          A
#2 0 1 0          B
#3 0 1 0          B
#4 1 0 0          A
#5 0 0 1          C

Better to do in this way since it works with column names and the letters are not hard-coded. If you change the column names, you will see the changes accordingly.

You could even do it in a better way:

data$tranformed <- toupper(names(data)[max.col(data)])

In case its allowed for data to have rows without any 1 like this:

#  a b c
#1 1 0 0
#2 0 1 0
#3 0 0 0
#4 1 0 0
#5 0 0 1

data <- structure(list(a = c(1, 0, 0, 1, 0), b = c(0, 1, 0, 0, 0), c = c(0, 
0, 0, 0, 1)), .Names = c("a", "b", "c"), row.names = c(NA, -5L
), class = "data.frame")

You could do this:

inds <- which(rowSums(data)==0)
data$tranformed <- toupper(names(data)[max.col(data)])
data$tranformed[inds] <- NA

Which will give you:

#  a b c tranformed
#1 1 0 0          A
#2 0 1 0          B
#3 0 0 0       <NA>
#4 1 0 0          A
#5 0 0 1          C

In one go `data$tranformed<-ifelse(rowSums(df)>=1,toupper(names(df)[max.col(df)]),NA)`. Much simplier than my way, well done. — Haboryme, Oct 18 '16 at 12:33
Very elegant solution! Thank you both to both of you for sharing. — ishido, Oct 18 '16 at 14:04

Haboryme · Answer 2 · 2016-10-18T11:45:57.117

1

data$transformed<-factor(apply(data, 1, function(x) which(x == 1)),labels = colnames(data))

or (letters for lowercase)

factor(LETTERS[apply(data, 1, function(x) which(x == 1))])

EDIT: In case there is a row with only 0's like in the following example for the 3rd row.

df=data.frame(a =c(1,0,0,1,0),
               b=c(0,1,0,0,0),
               c =c(0,0,0,0,1)
)
  a b c
1 1 0 0
2 0 1 0
3 0 0 0
4 1 0 0
5 0 0 1

You can't use the solutions above as the apply function will output a list of 0 length.
A workaround:

LETTERS[unlist(ifelse(sapply(apply(df, 1, function(x) which(x == 1)),length)==1,apply(df, 1, function(x) which(x == 1)),NA))]
[1] "A" "B" NA  "A" "C"

edited Oct 18 '16 at 11:45

answered Oct 18 '16 at 10:12

Haboryme

4,611
2
18
21

Hi, I used your first suggestion and it worked. Then I tried using it on the next category but got the error `data.lab$drinksubcat <- factor(apply(data.lab[,35:46],1, function(x) which(x == 1)),labels = colnames(data.lab[,35:46])) Error in sort.list(y) : 'x' must be atomic for 'sort.list' Have you called 'sort' on a list?` Do you know why I would get this? – ishido Oct 18 '16 at 11:03
Can you show a sample of your data? – Haboryme Oct 18 '16 at 11:14
I see now, for that category, sometimes none of the columns have a 1 and all of them are 0's. `data$LSM1 <- c(0,1,1,1,0) data$LSM2 <- c(1,0,0,0,0)` This must be the problem – ishido Oct 18 '16 at 11:17
Yes, neither method can handle only 0's. I'll see if there is a way around to output NA when this is the case. – Haboryme Oct 18 '16 at 11:22
See my edit for a possible way to deal with such a case. – Haboryme Oct 18 '16 at 11:46

How to revert one-hot encoded variable back into single column?

2 Answers2

Linked