1

I have a R data frame df as below

ID   <- c(1,2,1,3,3,3)
Time <- c("7:30","10:30","11:00","4:00","8:00","8:00")
sub_event <- c("TLIF","ALIF","ALIF","ALIF","TLIF","LAMI")
df <- data.frame(ID,Time,sub_event)

I am interested in the sequence of event by ID. e.g. ID 1, has 2 events at 7:30 and 10:30, and they should be seq 1 and 2 respectively for ID 1. The sequence is the order of events for an ID. Also an event is broken into sub-events and rowid is not useful. I am looking for an output as below

  ID  Time seq
1  1  7:30   1
2  2 10:30   1
3  1 11:00   2
4  3  4:00   1
5  3  8:00   2
alistaire
  • 42,459
  • 4
  • 77
  • 117
user3897
  • 551
  • 2
  • 5
  • 14
  • 1
    Pretty sure what you are asking is unclear. How is this sequence defined? – shayaa Aug 12 '16 at 20:07
  • 1
    @shayaa, MrFlick it's not what I first assumed - sequence can continue after interruption. Note the first ID 3 is part of `seq` 1, because it is the first 3. – Gregor Thomas Aug 12 '16 at 20:10
  • 3
    Don't use `cbind` to make data.frames; just use `data.frame`, i.e. `data.frame(ID, Time)` (or define `ID` and `Time` inside the call). Otherwise all your data will be coerced to the same data type (`cbind` makes a matrix, which can only hold one), and you'll quite possibly introduce bugs. Here, `ID` will end up as a numeric factor, which will almost certainly end up confusing you later. – alistaire Aug 12 '16 at 20:13
  • 1
    @Gregor that said, it's definitely a duplicate – MichaelChirico Aug 12 '16 at 20:17
  • @MichaelChirico can you please point me to the post. MrFlick's reference is not what I am looking for. – user3897 Aug 12 '16 at 20:19
  • Hmm, maybe not an exact duplicate, but `rowid` is demonstrated also [here](http://stackoverflow.com/questions/34730544/unelegant-decorate-count-undecorate-on-data-table-cumulative-sum/34731106#34731106) and [here](http://stackoverflow.com/questions/38722565/r-loop-optimisation-loop-is-way-too-time-consuming/38722778#38722778) – MichaelChirico Aug 12 '16 at 20:26
  • In my reading, with dplyr, `library(dplyr) ; df %>% mutate(Time = chron::times(paste0(Time, ':00'))) %>% group_by(ID) %>% arrange(Time) %>% mutate(seq = seq(n()))` – alistaire Aug 12 '16 at 20:29

1 Answers1

3

Using the new rowid function (1.9.8+):

library(data.table)

setDT(df)

df[ , seq := rowid(ID)][]
#    ID  Time seq
# 1:  1  7:30   1
# 2:  2 10:30   1
# 3:  1 11:00   2
# 4:  3  4:00   1
# 5:  3  8:00   2

As a general note, I advise against overloading base functions with your variable names -- both seq and df are functions. It will eventually come back to haunt you.

MichaelChirico
  • 33,841
  • 14
  • 113
  • 198