2

I have this data:

df <- data.frame(
  Sequ = c(NA, 8, 8, NA, 1, 1, 1, NA, NA, NA, 22, 22, NA),
  Q = c(NA, "q_x", "", NA, "q_2", "", "", NA, NA, NA, "q_xyz", "", NA)
)

What I'd like to do is introduce a correct run-length id in Sequ where it is not NA. What I've tried so far does get me the desired result but my feeling is that there must be a better more efficient, more concise way:

cats = c("q_x", "q_2", "q_xyz")
df %>%
  mutate(Sequ = cumsum(Q %in% cats)) %>% 
  mutate(Sequ = ifelse(is.na(Q), NA, Sequ))
   Sequ     Q
1    NA  <NA>
2     1   q_x
3     1      
4    NA  <NA>
5     2   q_2
6     2      
7     2      
8    NA  <NA>
9    NA  <NA>
10   NA  <NA>
11    3 q_xyz
12    3      
13   NA  <NA>

Any help?

Chris Ruehlemann
  • 20,321
  • 4
  • 12
  • 34

1 Answers1

1

Another possible solution,

library(dplyr)

df %>% 
 mutate(Sequ = replace(Sequ, !is.na(Sequ), cumsum(grepl('q', Q))[!is.na(Sequ)]))

   Sequ     Q
1    NA  <NA>
2     1   q_x
3     1      
4    NA  <NA>
5     2   q_2
6     2      
7     2      
8    NA  <NA>
9    NA  <NA>
10   NA  <NA>
11    3 q_xyz
12    3      
13   NA  <NA>
Sotos
  • 51,121
  • 6
  • 32
  • 66