Creating sequence indicator

Question

I have a R data frame df as below

ID   <- c(1,2,1,3,3,3)
Time <- c("7:30","10:30","11:00","4:00","8:00","8:00")
sub_event <- c("TLIF","ALIF","ALIF","ALIF","TLIF","LAMI")
df <- data.frame(ID,Time,sub_event)

I am interested in the sequence of event by ID. e.g. ID 1, has 2 events at 7:30 and 10:30, and they should be seq 1 and 2 respectively for ID 1. The sequence is the order of events for an ID. Also an event is broken into sub-events and rowid is not useful. I am looking for an output as below

  ID  Time seq
1  1  7:30   1
2  2 10:30   1
3  1 11:00   2
4  3  4:00   1
5  3  8:00   2

Pretty sure what you are asking is unclear. How is this sequence defined? — shayaa, Aug 12 '16 at 20:07
@shayaa, MrFlick it's not what I first assumed - sequence can continue after interruption. Note the first ID 3 is part of `seq` 1, because it is the first 3. — Gregor Thomas, Aug 12 '16 at 20:10
Don't use `cbind` to make data.frames; just use `data.frame`, i.e. `data.frame(ID, Time)` (or define `ID` and `Time` inside the call). Otherwise all your data will be coerced to the same data type (`cbind` makes a matrix, which can only hold one), and you'll quite possibly introduce bugs. Here, `ID` will end up as a numeric factor, which will almost certainly end up confusing you later. — alistaire, Aug 12 '16 at 20:13
@MichaelChirico can you please point me to the post. MrFlick's reference is not what I am looking for. — user3897, Aug 12 '16 at 20:19
Hmm, maybe not an exact duplicate, but `rowid` is demonstrated also [here](http://stackoverflow.com/questions/34730544/unelegant-decorate-count-undecorate-on-data-table-cumulative-sum/34731106#34731106) and [here](http://stackoverflow.com/questions/38722565/r-loop-optimisation-loop-is-way-too-time-consuming/38722778#38722778) — MichaelChirico, Aug 12 '16 at 20:26
In my reading, with dplyr, `library(dplyr) ; df %>% mutate(Time = chron::times(paste0(Time, ':00'))) %>% group_by(ID) %>% arrange(Time) %>% mutate(seq = seq(n()))` — alistaire, Aug 12 '16 at 20:29

MichaelChirico · Answer 1 · 2016-11-30T23:38:32.383

3

Using the new rowid function (1.9.8+):

library(data.table)

setDT(df)

df[ , seq := rowid(ID)][]
#    ID  Time seq
# 1:  1  7:30   1
# 2:  2 10:30   1
# 3:  1 11:00   2
# 4:  3  4:00   1
# 5:  3  8:00   2

As a general note, I advise against overloading base functions with your variable names -- both seq and df are functions. It will eventually come back to haunt you.

edited Nov 30 '16 at 23:38

answered Aug 12 '16 at 20:07

MichaelChirico

33,841
14
113
198

Thanks! This is close. Can I use more than 2 elements in the "by" clause. I would like to use df[ , seq := 1:.N, by = c("ID","Time")]. – user3897 Aug 12 '16 at 20:28
@user3897 that should be working.... – MichaelChirico Aug 12 '16 at 20:30
Actually, it doesn't. Is there a different syntax to specify multiple arguments to the "by" clause. – user3897 Aug 12 '16 at 20:35
@user3897 you skipped the crucial `setDT(df)` step, which converts `df` from a `data.frame` to a `data.table` -- how `[]` works differs depending on whether `df` is a `data.frame` or `data.table`. – MichaelChirico Aug 12 '16 at 20:36
I haven't used data.table. I usually stick to dplyr package for most of my data cleaning. Could you please help me with the notations or point me to a reference? – user3897 Aug 12 '16 at 20:39
@user See Henrik's answer in the linked question: http://stackoverflow.com/a/30596227/ – Frank Aug 12 '16 at 20:40
@user3897 see here: https://github.com/Rdatatable/data.table/wiki/Getting-started – MichaelChirico Aug 12 '16 at 20:44
@Frank Thank You! That helps – user3897 Aug 12 '16 at 20:54

Creating sequence indicator

1 Answers1