Suppose I have data that has the following format:
ID VALUE
a a
a b
d b
d c
What I would like to do is a one hot-encoding for the ID value. When I use model.matrix
, I obtain:
model.matrix(~VALUE-1, df)
ID aVALUE bVALUE cVALUE
a 1 0 0
a 0 1 0
d 0 1 0
d 0 0 1
What I would like to get however is this:
ID aVALUE bVALUE cVALUE
a 1 1 0
d 0 1 1
The other part to this is that my data frame is approximately 30 million rows - so I am looking for an efficient way to do this. Any help or comments would be greatly appreciated!
Thank you!