Sample data:
df <- data.frame("ID" = 1:6,
"Group1" = c("A", NA, "C", NA, "E", "C"),
"Group2" = c("E", "C", "C", NA, "E", "E"),
"Group3" = c("A", "A", NA, NA, "C", NA),
"Group4" = c(NA, "C", NA, "D", "C", NA),
"Group5" = c("A", "D", NA, NA, NA, NA))
In each row, I want to count the number of each value and store the most frequent value in a new variable, New.Group
. In case of ties, the first value in the row should be selected. The logic applied to the example:
Row 1 of New.Group
takes value A
because it is most frequent value in the row, ignoring NA
s.
Row 2 takes value C
because it is also the most frequent value.
Row 3 the same as Row 2.
Row 4 takes value D
because it's the only value in the row.
In Row 5 both E
and C
has count 2, but E
is selected because it is encountered before C
in the row.
Row 6, similar to row 5, both C
and E
has count 1, but C
is selected because it is encountered before E
in the row.
The desired output:
ID Group1 Group2 Group3 Group4 Group5 New.Group
1 1 A E A <NA> A A
2 2 <NA> C A C D C
3 3 C C <NA> <NA> <NA> C
4 4 <NA> <NA> <NA> D <NA> D
5 5 E E C C <NA> E
6 6 C E <NA> <NA> <NA> C