How to count subsets of elements in an data frame using base R or dplyr?

Question

I would like to add a column to the below data frame nCode, call the desired new column "grpRnk", that counts each group's rank (a group defined as Group value <> 0) among the other groups in the dataframe, with the top rank defined as the lowest associated nmCnt for that grouped row and then descending rank from there as the nmCnt increases for the other grouped rows. As described in the column manually added ("grpRnk ADD") to the far right in the data frame output below:

> print.data.frame(nCode)
  Name Group nmCnt seqBase subGrp  grpRnk ADD
1    B     0     1       1      0    0 since Group = 0
2    R     0     1       1      0    0 since Group = 0
3    R     1     2       2      1    2 since it is 2nd place among the Groups, with its nmCnt > the nmCnt for the highest ranking Group in row 6
4    R     1     3       2      2    2 same reason as above
5    B     0     2       2      0    0 since Group = 0
6    X     2     1       1      1    1 since it is 1st place among the Groups, with its nmCnt of 1 is the lowest among all the groups
7    X     2     2       1      2    1 same reason as above

Any recommendations for how to do this in base R or dplyr?

Below is the code that generates the above (except for the column manually added on the right):

library(dplyr)
library(stringr)

myDF5 <- 
  data.frame(
    Name = c("B","R","R","R","B","X","X"),
    Group = c(0,0,1,1,0,2,2)
  )

nCode <-  myDF5 %>%
  group_by(Name) %>%
  mutate(nmCnt = row_number()) %>%
  ungroup() %>%
  mutate(seqBase = ifelse(Group == 0 | Group != lag(Group), nmCnt,0)) %>%
  mutate(seqBase = na_if(seqBase, 0)) %>%
  group_by(Name) %>%
  fill(seqBase) %>%
  mutate(seqBase = match(seqBase, unique(seqBase))) %>%
  ungroup %>%
  mutate(subGrp = as.integer(ifelse(Group > 0, sapply(1:n(), function(x) sum(Name[1:x]==Name[x] & Group[1:x] == Group[x])),0)))

print.data.frame(nCode)

score 0 · Accepted Answer · answered Sep 17 '22 at 08:17

Here's a dplyr solution. However instead of filling non-groups with 0 per my OP, this code drops in NA for non-groups which works better for me for what this is intended for. The dplyr slice() function used in my solution is new to me and is very useful, I found out about it in post dplyr filter: Get rows with minimum of variable, but only the first if multiple minima

grpRnk <- nCode %>% select(Name,Group,nmCnt) %>% 
  filter(Group > 0) %>% 
  group_by(Name) %>% 
  slice(which.min(Group)) %>% 
  arrange(nmCnt) %>%
  select(-nmCnt)
grpRnk$grpRnk <- as.integer(row.names(grpRnk)) 
left_join(nCode,grpRnk)

How to count subsets of elements in an data frame using base R or dplyr?

1 Answers1