1

I have long vector of patient statuses in R that are chronologically sorted, and a label of associated patient IDs. This vector is an element of a dataframe. I would like to label consecutive rows of data for which the patient status is the same. If the status changes, then reverts to its original value, that would be three separate events. This is different than most situations I have searched where duplicated or match would suffice.

An example would be along the lines of:

s <- c(0,0,0,1,1,1,0,0,2,1,1,0,0)
id <- c(1,1,1,1,1,1,1,2,2,2,2,2,2)

and the desired output would be

flag <- c(1,1,1,2,2,2,3,1,2,3,4,4)

or

flag <- c(1,1,1,2,2,2,3,4,5,6,7,7)

One inelegant approach would be to generate the sequence:

unlist(tapply(s, id, function(x) cumsum(c(T, x[-1] != rev(rev(x)[-1])))))

Is there a better way?

Jaap
  • 81,064
  • 34
  • 182
  • 193
AdamO
  • 4,283
  • 1
  • 27
  • 39
  • Possible duplicate of https://stackoverflow.com/questions/37809094/create-group-names-for-consecutive-values – zx8754 Jan 26 '18 at 19:02
  • Also related: [*Is there a dplyr equivalent to data.table::rleid?*](https://stackoverflow.com/q/33507868/2204410) – Jaap Jan 26 '18 at 20:05
  • 1
    @zx8754 my question is different. My group label should be strictly increasing, the status indicator returning to an earlier value indicates a new episode. For their example, my desired label would be: `0, 0, 0, 0, 1, 1, 2`. Cat 1 is <1000 for 3 cases, Cat 2 > 1000 for case 1, < 1000 for cases 2 and 3, then >1000 again for case 4. – AdamO Jan 26 '18 at 21:23

2 Answers2

1

I think you could use rleid from data.table for this:

library(data.table)
rleid(s,id)

Output:

1 1 1 2 2 2 3 4 5 6 6 7 7

Or for the first sequence:

data.table(s,id)[,rleid(s),id]$V1

Output:

 1 1 1 2 2 2 3 1 2 3 3 4 4
Florian
  • 24,425
  • 4
  • 49
  • 80
1

Run Length Encoding - rle()

tapply(s, id, function(x) { v<-rle(x)$length rep(1:length(v), v) })

Vlo
  • 3,168
  • 13
  • 27