I would like to add a column to the below data frame nCode
, call the desired new column "grpRnk
", that counts each group's rank (a group defined as Group
value <> 0) among the other groups in the dataframe, with the top rank defined as the lowest associated nmCnt
for that grouped row and then descending rank from there as the nmCnt
increases for the other grouped rows. As described in the column manually added ("grpRnk ADD") to the far right in the data frame output below:
> print.data.frame(nCode)
Name Group nmCnt seqBase subGrp grpRnk ADD
1 B 0 1 1 0 0 since Group = 0
2 R 0 1 1 0 0 since Group = 0
3 R 1 2 2 1 2 since it is 2nd place among the Groups, with its nmCnt > the nmCnt for the highest ranking Group in row 6
4 R 1 3 2 2 2 same reason as above
5 B 0 2 2 0 0 since Group = 0
6 X 2 1 1 1 1 since it is 1st place among the Groups, with its nmCnt of 1 is the lowest among all the groups
7 X 2 2 1 2 1 same reason as above
Any recommendations for how to do this in base R or dplyr?
Below is the code that generates the above (except for the column manually added on the right):
library(dplyr)
library(stringr)
myDF5 <-
data.frame(
Name = c("B","R","R","R","B","X","X"),
Group = c(0,0,1,1,0,2,2)
)
nCode <- myDF5 %>%
group_by(Name) %>%
mutate(nmCnt = row_number()) %>%
ungroup() %>%
mutate(seqBase = ifelse(Group == 0 | Group != lag(Group), nmCnt,0)) %>%
mutate(seqBase = na_if(seqBase, 0)) %>%
group_by(Name) %>%
fill(seqBase) %>%
mutate(seqBase = match(seqBase, unique(seqBase))) %>%
ungroup %>%
mutate(subGrp = as.integer(ifelse(Group > 0, sapply(1:n(), function(x) sum(Name[1:x]==Name[x] & Group[1:x] == Group[x])),0)))
print.data.frame(nCode)