I have a data frame with IDs and string values, of which some I prefer over others:
library(dplyr)
d1<-data.frame(id=c("a", "a", "b", "b"),
value=c("good", "better", "good", "good"))
I wand to handle that equivalent to the following example with numbers:
d2<-data.frame(id=c("a", "a", "b", "b"),
value=c(1, 2, 1, 1))
d2 %>% group_by(id) %>%
summarize(max(value))
So if an ID has multiple values, I will always get the highest number for each ID:
# A tibble: 2 x 2
id `max(value)`
<fct> <dbl>
1 a 2
2 b 1
Equivalent, if an ID has multiple strings, I always want to extract the preferred string for the d1 dataframe: If we have "good", use that row, if another row has "better" use that row instead, thus eliminating duplicated IDs.
The example is arbitrary, could also be >>if we have "yes" and "unknown" then take "yes", else take "unknown"<<
So is there an "extract best string" function for the dplyr::summarize()
function?
The result should look like this:
id | value
----------
"a"| "better"
"b"| "good"