I have a very large dataframe with character values. I want to compare the rows to each other and create IDs based on the comparison. The problem is that there are NA in the df and that I want those be evaluated as matching any value. The other issue is that the IDs need to be created as well in the same step (or I'm thinking about the problem in a too complicated way).
Here's the toy df I created:
library(tidyverse)
library(purrr)
# make toy df
Set1 <- c("A", "B", "C","A")
Set2 <- c("A", "D", "B", "A")
Set3 <- c(NA, "B", "C", "A")
Set4 <- c("A", "G", "B", "A")
Set5 <- c("F", "G", NA, "F")
Set6 <- c("A", "B", "C", "C")
sets <- rbind(Set1, Set2, Set3, Set4, Set5, Set6)
colnames(sets) <- c("Var1", "Var2", "Var3", "Var4")
sets
Var1 Var2 Var3 Var4
Set1 "A" "B" "C" "A"
Set2 "A" "D" "B" "A"
Set3 NA "B" "C" "A"
Set4 "A" "D" "B" "A"
Set5 "F" "G" NA "F"
Set6 "A" "B" "C" "C"
And here's the desired output, either as a separate df or as a new column, either one would be just as good:
# as new column
Var1 Var2 Var3 Var4 COMP
Set1 "A" "B" "C" "A" "Group1"
Set2 "A" "D" "B" "A" "Group2
Set3 NA "B" "C" "A" "Group1"
Set4 "A" "D" "B" "A" "Group3"
Set5 "F" "G" NA "F" "Group4"
Set6 "A" "B" "C" "C" "Group5"
# as new df
COMP
Set1 "Group1"
Set2 "Group2
Set3 "Group1"
Set4 "Group3"
Set5 "Group4"
Set6 "Group5"
I'm thinking this can be achieved with rowwise()
and map
, but even after reading similar questions I cannot figure out exactly how to achieve this, especially how to name the new groups consecutively and consistently. Any ideas would be much appreciated.