0

I have the following variables in a dataframe:

App.Date         App.No.
01/01/2012       A0001
01/01/2012       A0082
01/01/2012       F0003
02/01/2012       A0004
02/01/2012       A0055
03/01/2012       P0006
03/01/2012       A0007
03/01/2012       A0008
03/01/2012       A0009
.........        ......

I want to create a variable which will track the ordering on each date The output will be another variable which will sequence and start on each change of date

App.Date         App.No.  Seq
01/01/2012       A0001    1
01/01/2012       A0082    2
01/01/2012       F0003    3
02/01/2012       A0004    1
02/01/2012       A0055    2
03/01/2012       P0006    1
03/01/2012       A0007    2
03/01/2012       A0008    3
03/01/2012       A0009    4
.........        ......   .

How can I do it in R without for loop ? There are over 100k rows, have to create a series on each change of date.

James Z
  • 12,209
  • 10
  • 24
  • 44
Harry
  • 198
  • 12

1 Answers1

1

You can use dplyr with the mutate function to create a new column which will be a vector from 1 to the number of rows within each group of date:

library(dplyr)
df <- df %>% group_by(App.Date) %>% mutate(seq = 1:n())
df    

# Source: local data frame [9 x 3]
# Groups: App.Date [3]

#     App.Date App.No.   seq
#       <fctr>  <fctr> <int>
# 1 01/01/2012   A0001     1
# 2 01/01/2012   A0082     2
# 3 01/01/2012   F0003     3
# 4 02/01/2012   A0004     1
# 5 02/01/2012   A0055     2
# 6 03/01/2012   P0006     1
# 7 03/01/2012   A0007     2
# 8 03/01/2012   A0008     3
# 9 03/01/2012   A0009     4

Or use ave to do a cumsum on a sequence of ones grouped by App.Date:

df$Seq <- ave(rep(1, nrow(df)), df$App.Date, FUN = cumsum)
df$Seq
# [1] 1 2 3 1 2 1 2 3 4

When you get familiar with data.table package:

library(data.table)
setDT(df)
df[, Seq := 1:.N, .(App.Date)]
Psidom
  • 209,562
  • 33
  • 339
  • 356