4

I would like a function that works equivalent to cumsum but rather than adding up it counts the number of unique values so far. I could write a loop for each potential set but that seems like it could get time consuming as my dataset has millions of observations.

Example:

a <- c(1,3,2,4,1,5,2,3)
f(a)
[1] 1 2 3 4 4 5 5 5
Francis Smart
  • 3,875
  • 6
  • 32
  • 58

2 Answers2

10

You can try:

cumsum(!duplicated(a))
#[1] 1 2 3 4 4 5 5 5
nicola
  • 24,005
  • 3
  • 35
  • 56
2

We can try

library(zoo)
a[duplicated(a)] <- NA
a[!is.na(a)] <- seq_along(a[!is.na(a)])
na.locf(a)
#[1] 1 2 3 4 4 5 5 5

Or another option is

cumsum(ave(a, a, FUN=seq_along)==1)
#[1] 1 2 3 4 4 5 5 5

Or a compact option would be

library(splitstackshape)
getanID(a)[, cumsum(.id==1)]
#[1] 1 2 3 4 4 5 5 5
akrun
  • 874,273
  • 37
  • 540
  • 662