2

This an example of my data:

df <- data.frame(dyad = c("a", "a", "b", NA, "c", NA, "c", "b"))
df
#   dyad
# 1    a
# 2    a
# 3    b
# 4 <NA>
# 5    c
# 6 <NA>
# 7    c
# 8    b

I want to create an index for runs consecutive runs of dyad that are the same.

Note 1: dyad might be repeated throught the dataframe, but should always have a new unique label if not consecutive to the previous rows in which dyad is the same. E.g. the "b" on row 3 and 8 should have different id.

Note 2: identical dyad before and after NA should have different id. E.g. the "c" before and after the last NA should have a different id.

Thus, the expected result is:

#   dyad event
# 1    a     1
# 2    a     1
# 3    b     2
# 4 <NA>    NA
# 5    c     3
# 6 <NA>    NA
# 7    c     4
# 8    b     5

Any insight in how to make it work or advice are welcome!

Henrik
  • 65,555
  • 14
  • 143
  • 159
valo
  • 25
  • 5
  • 1
    One possibility: `r = rle(df$dyad)`; `r$values[!is.na(r$values)] = seq_along(r$values[!is.na(r$values)])`; `df$rid = inverse.rle(r)` – Henrik Jul 13 '20 at 12:15

1 Answers1

1

Using rleid from data.table and cumsum.

library(data.table)

df$event <- rleid(df$dyad) - cumsum(is.na(df$dyad))
df$event[is.na(df$dyad)] <- NA
df

#  dyad event
#1    a     1
#2    a     1
#3    b     2
#4 <NA>    NA
#5    c     3
#6 <NA>    NA
#7    c     4
#8    b     5

Well the above solution does not work when you have consecutive NA's, in that case we can use :

x = c("a", NA, NA, "a", "b", "b", "c", NA)
y <- cumsum(!duplicated(rleid(x)) & !is.na(x))
y[is.na(x)] <- NA
y
#[1]  1 NA NA  2  3  3  4 NA
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Thank you! This also worked as well as the solution provided by @Henrik in the comments above. – valo Jul 13 '20 at 12:40
  • @Ronak Shah, How does this handle runs of more than one `NA`? E.g. for `x = c("a", NA, NA, "a")`, I would expect `1 NA NA 2`. This might not be an issue for OP, I was just thinking of a bit more general solution. Any ideas? Cheers. – Henrik Jul 13 '20 at 13:16
  • 1
    Yes, you are right. It does not work with consecutive `NA`'s. I made an edit to update the answer. Although I think I have over complicated the solution and there might be a simpler solution to this. – Ronak Shah Jul 13 '20 at 13:27