Add index to runs of equal values, accounting for NA

Question

This an example of my data:

df <- data.frame(dyad = c("a", "a", "b", NA, "c", NA, "c", "b"))
df
#   dyad
# 1    a
# 2    a
# 3    b
# 4 <NA>
# 5    c
# 6 <NA>
# 7    c
# 8    b

I want to create an index for runs consecutive runs of dyad that are the same.

Note 1: dyad might be repeated throught the dataframe, but should always have a new unique label if not consecutive to the previous rows in which dyad is the same. E.g. the "b" on row 3 and 8 should have different id.

Note 2: identical dyad before and after NA should have different id. E.g. the "c" before and after the last NA should have a different id.

Thus, the expected result is:

#   dyad event
# 1    a     1
# 2    a     1
# 3    b     2
# 4 <NA>    NA
# 5    c     3
# 6 <NA>    NA
# 7    c     4
# 8    b     5

Any insight in how to make it work or advice are welcome!

One possibility: `r = rle(df$dyad)`; `r$values[!is.na(r$values)] = seq_along(r$values[!is.na(r$values)])`; `df$rid = inverse.rle(r)` — Henrik, Jul 13 '20 at 12:15

Ronak Shah · Accepted Answer · 2020-07-13T13:26:47.863

1

Using rleid from data.table and cumsum.

library(data.table)

df$event <- rleid(df$dyad) - cumsum(is.na(df$dyad))
df$event[is.na(df$dyad)] <- NA
df

#  dyad event
#1    a     1
#2    a     1
#3    b     2
#4 <NA>    NA
#5    c     3
#6 <NA>    NA
#7    c     4
#8    b     5

Well the above solution does not work when you have consecutive NA's, in that case we can use :

x = c("a", NA, NA, "a", "b", "b", "c", NA)
y <- cumsum(!duplicated(rleid(x)) & !is.na(x))
y[is.na(x)] <- NA
y
#[1]  1 NA NA  2  3  3  4 NA

edited Jul 13 '20 at 13:26

answered Jul 13 '20 at 12:34

Ronak Shah

377,200
20
156
213

Thank you! This also worked as well as the solution provided by @Henrik in the comments above. – valo Jul 13 '20 at 12:40
@Ronak Shah, How does this handle runs of more than one `NA`? E.g. for `x = c("a", NA, NA, "a")`, I would expect `1 NA NA 2`. This might not be an issue for OP, I was just thinking of a bit more general solution. Any ideas? Cheers. – Henrik Jul 13 '20 at 13:16
1

Yes, you are right. It does not work with consecutive `NA`'s. I made an edit to update the answer. Although I think I have over complicated the solution and there might be a simpler solution to this. – Ronak Shah Jul 13 '20 at 13:27

Add index to runs of equal values, accounting for NA

1 Answers1