-2

I want to do a simple task and can not figure out how. I have a dataframe (data.table in fact) similar to this:

date         code1 
2015-03-01     A
2015-03-02     A
2015-03-03     A
2015-03-01     B
2015-03-02     B
2015-03-03     B

and I want a new column with the date order by code1 as follows:

date         code1   order
2015-03-01     A       1
2015-03-02     A       2
2015-03-03     A       3
2015-03-01     B       1
2015-03-02     B       2
2015-03-03     B       3

thanks in advance.

alistaire
  • 42,459
  • 4
  • 77
  • 117
COLO
  • 1,024
  • 16
  • 28

2 Answers2

1

Using data.table:

dt[, order := seq(.N), by = code1]

> dt
#         date code1 order
#1: 2015-03-01     A     1
#2: 2015-03-02     A     2
#3: 2015-03-03     A     3
#4: 2015-03-01     B     1
#5: 2015-03-02     B     2
#6: 2015-03-03     B     3
mtoto
  • 23,919
  • 4
  • 58
  • 71
  • this is what is was looking for. it performed really fine in a >100 millon rows dataframe. thanks! – COLO Mar 15 '16 at 20:00
  • 1
    From the next version, it's also possible to do: `dt[, order := rowid(code1)]`. – Arun Mar 15 '16 at 20:26
1

We can use dplyr

library(dplyr)
df %>%
   group_by(code1) %>%
   mutate(Order = row_number())
   date code1 Order
#      (chr) (chr) (int)
#1 2015-03-01     A     1
#2 2015-03-02     A     2
#3 2015-03-03     A     3
#4 2015-03-01     B     1
#5 2015-03-02     B     2
#6 2015-03-03     B     3

As @alistaire mentioned, if the 'date' column is not ordered, we can either use arrange to order by 'date' after the group_by and then create the sequence with mutate or use rank

df %>% 
  group_by(code1) %>% 
  mutate(Order = rank(date))

Or using a convenient wrapper from splitstackshape

library(splitstackshape)
getanID(df, 'code1')[]

Or with base R

df$Order <- with(df, ave(seq_along(code1), code1, FUN= seq_along)) 
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    In case they're not in order, it's probably safer to do `df %>% group_by(code1) %>% mutate(order = rank(date))` – alistaire Mar 15 '16 at 19:56