0

I want to add an ordered ID (by date) to each group in a data frame. I can do this using dplyr (R - add column that counts sequentially within groups but repeats for duplicates):

# Example data
date <- rep(c("2016-10-06 11:56:00","2016-10-05 11:56:00","2016-10-05 11:56:00","2016-10-07 11:56:00"),2)
date <- as.POSIXct(date)
group <- c(rep("A",4), rep("B",4))    
df <- data.frame(group, date)

# dplyr - dense_rank
df2 <- df %>% group_by(group) %>% 
       mutate(m.test=dense_rank(date))

   group                date m.test
  <fctr>              <dttm>  <int>
1      A 2016-10-06 11:56:00      2
2      A 2016-10-05 11:56:00      1
3      A 2016-10-05 11:56:00      1
4      A 2016-10-07 11:56:00      3
5      B 2016-10-06 11:56:00      2
6      B 2016-10-05 11:56:00      1
7      B 2016-10-05 11:56:00      1
8      B 2016-10-07 11:56:00      3

So my new column m.test ranks each group by date. If I use rleid and data.table, it doesn't seem to work (05/10 ranked after 06/10):

df3 <- setDT(df)[, m.test := rleid(date), by = group]

   group                date m.test
1:     A 2016-10-06 11:56:00      1
2:     A 2016-10-05 11:56:00      2
3:     A 2016-10-05 11:56:00      2
4:     A 2016-10-07 11:56:00      3
5:     B 2016-10-06 11:56:00      1
6:     B 2016-10-05 11:56:00      2
7:     B 2016-10-05 11:56:00      2
8:     B 2016-10-07 11:56:00      3

Am I getting the syntax wrong?

Henrik
  • 65,555
  • 14
  • 143
  • 159
Pete900
  • 2,016
  • 1
  • 21
  • 44
  • The `data.table` equivalent of dplyr's `dense_rank(...)` is `frank(..., ties.method = "dense")`, afaik – talat Nov 14 '16 at 12:39
  • Thanks. I was getting confused from the answer to this question I originally asked (http://stackoverflow.com/questions/37008864/add-id-by-group-which-resets-to-1-in-r). I assume that rleid doesn't work for date in this case. – Pete900 Nov 14 '16 at 12:55
  • Do you want to post an answer. – Pete900 Nov 14 '16 at 12:55
  • No, but you can answer you own question (or delete it) – talat Nov 14 '16 at 13:30

1 Answers1

2

Thanks to @docendo discimus, the correct way to do this with data.table is frank(..., ties.method = "dense"):

df4 <- setDT(df)[, m.test := frank(date, ties.method = "dense"), by = group]
Pete900
  • 2,016
  • 1
  • 21
  • 44