calculate a column with the order of a date in other column

Question

I want to do a simple task and can not figure out how. I have a dataframe (data.table in fact) similar to this:

date         code1 
2015-03-01     A
2015-03-02     A
2015-03-03     A
2015-03-01     B
2015-03-02     B
2015-03-03     B

and I want a new column with the date order by code1 as follows:

date         code1   order
2015-03-01     A       1
2015-03-02     A       2
2015-03-03     A       3
2015-03-01     B       1
2015-03-02     B       2
2015-03-03     B       3

thanks in advance.

In base R, `df$order <- unlist(lapply(split(df, df$code1), function(x){seq_along(x$date)}))` — alistaire, Mar 15 '16 at 19:42
...or better in case they're not in order, `unlist(tapply(df$date, df$code1, rank))` — alistaire, Mar 15 '16 at 19:54

mtoto · Accepted Answer · 2016-03-15T20:04:41.310

1

Using data.table:

dt[, order := seq(.N), by = code1]

> dt
#         date code1 order
#1: 2015-03-01     A     1
#2: 2015-03-02     A     2
#3: 2015-03-03     A     3
#4: 2015-03-01     B     1
#5: 2015-03-02     B     2
#6: 2015-03-03     B     3

edited Mar 15 '16 at 20:04

answered Mar 15 '16 at 19:45

mtoto

23,919
4
58
71

this is what is was looking for. it performed really fine in a >100 millon rows dataframe. thanks! – COLO Mar 15 '16 at 20:00
1

From the next version, it's also possible to do: `dt[, order := rowid(code1)]`. – Arun Mar 15 '16 at 20:26

akrun · Answer 2 · 2016-03-15T20:00:20.933

We can use dplyr

library(dplyr)
df %>%
   group_by(code1) %>%
   mutate(Order = row_number())
   date code1 Order
#      (chr) (chr) (int)
#1 2015-03-01     A     1
#2 2015-03-02     A     2
#3 2015-03-03     A     3
#4 2015-03-01     B     1
#5 2015-03-02     B     2
#6 2015-03-03     B     3

As @alistaire mentioned, if the 'date' column is not ordered, we can either use arrange to order by 'date' after the group_by and then create the sequence with mutate or use rank

df %>% 
  group_by(code1) %>% 
  mutate(Order = rank(date))

Or using a convenient wrapper from splitstackshape

library(splitstackshape)
getanID(df, 'code1')[]

Or with base R

df$Order <- with(df, ave(seq_along(code1), code1, FUN= seq_along))

In case they're not in order, it's probably safer to do `df %>% group_by(code1) %>% mutate(order = rank(date))` — alistaire, Mar 15 '16 at 19:56

calculate a column with the order of a date in other column

2 Answers2