I have a data frame that has several hundred thousand rows and 6 columns. Each column contains IDs (there are about 500 unique IDs in total). I would like to convert this data frame into a large table/matrix with each unique ID having its own column, and having a -1, 0, or 1 for each entry corresponding to the following logic: 0 if the ID is not present, -1 if the ID is in the first 3 columns, and 1 if the ID is in the last 3 columns.
I can use the brute force approach, looping through each row, one-by-one, but I'm looking for a faster and more polished way to do this. My preference would be to use a dplyr solution, assuming one exists. I'm guessing there's also a nifty way to do this with data.table, or even just a simple base R approach. Any help would be appreciated!
Thanks in advance. Here's an example of what my data looks like:
df <- data.frame(matrix(c("XX001","XX002","XX003","XX007","XX008","XX009",
"XX001","XX004","XX005","XX006","XX010","XX008",
"XX003","XX002","XX005","XX008","XX009","XX010",
"XX002","XX005","XX003","XX009","XX007","XX010",
"XX001","XX002","XX004","XX007","XX009","XX006"),
nrow=5, ncol=6, byrow=1))
names(df) <- c("ID_X1","ID_X2","ID_X3","ID_Y1","ID_Y2","ID_Y3")
df
> df
ID_X1 ID_X2 ID_X3 ID_Y1 ID_Y2 ID_Y3
1 XX001 XX002 XX003 XX007 XX008 XX009
2 XX001 XX004 XX005 XX006 XX010 XX008
3 XX003 XX002 XX005 XX008 XX009 XX010
4 XX002 XX005 XX003 XX009 XX007 XX010
5 XX001 XX002 XX004 XX007 XX009 XX006
And here's what I would like my output to look like:
> yay
XX001 XX002 XX003 XX004 XX005 XX006 XX007 XX008 XX009 XX010 ... XX500
1 -1 -1 -1 0 0 0 1 1 1 0 ... 0
2 -1 0 0 -1 -1 1 0 1 0 1 ... 0
3 0 -1 -1 0 -1 0 0 1 1 1 ... 0
4 0 -1 -1 0 -1 0 1 0 1 1 ... 0
5 -1 -1 0 -1 0 1 1 0 1 0 ... 0
. Run `rlang::last_error()` to see where the error occurred.