I have a dataframe like this
library(plyr)
library(dplyr)
ID <- c("ID001","ID002","ID003","ID004","ID005",
"ID006","ID007","ID008","ID009","ID010")
Fail <- c(3,7,2,3,9,7,3,2,3,9)
Pass <- c(0,0,4,26,1,1,3,0,9,9)
df <- data.frame(ID,Fail,Pass)
I add another column to calculate fail percentage
df$Fail_Percentage <- (df$Fail/(df$Fail+df$Pass))*100
Now, I order this data frame and create a variable "Rank"
library(data.table)
df <- df%>%
arrange(-Fail_Percentage) %>%
mutate(Rank = rleid(Fail_Percentage))
df
I get this output
ID Fail Pass Fail_Percentage Rank
ID001 3 0 100.00000 1
ID002 7 0 100.00000 1
ID008 2 0 100.00000 1
ID005 9 1 90.00000 2
ID006 7 1 87.50000 3
ID007 3 3 50.00000 4
ID010 9 9 50.00000 4
ID003 2 4 33.33333 5
ID009 3 9 25.00000 6
ID004 3 26 10.34483 7
The problem here is that I create duplicate ranks this way. I want to rank by giving preference to the "Fails".
For example: ID001,ID002,ID008 all have rank 1 but I want to rank by giving preference to highest fails by pass percentage. So ID002 will have rank 1, ID001 will have rank 2 and ID008 will have rank 3. I want to do it this way and similarly rank other entries too.
My desired output would be
ID Fail Pass Fail_Percentage Rank
ID002 7 0 100.00000 1
ID001 3 0 100.00000 2
ID008 2 0 100.00000 3
ID005 9 1 90.00000 4
ID006 7 1 87.50000 5
ID010 9 9 50.00000 6
ID007 3 3 50.00000 7
ID003 2 4 33.33333 8
ID009 3 9 25.00000 9
ID004 3 26 10.34483 10
How can we do this better? Could someone help me point in the right direction?