1

I need to create a numerical order of appearance depending on the nth time an "appln_id" appears in the data frame. The columns "numorder" is what I am trying to get:

appln_id numberclass    weight        order  numorder
1   1       558       0.10000000         1         1
2   1       558       0.10000000         2         2
3   1       558       0.10000000         3         3
4   1       558       0.10000000         4         4
5   1       558       0.10000000         5         5
6   2        88       0.00435817         6         1
7   2       282       0.00435817         7         2 
8   2       282       0.00435817         8         3
9   2       282       0.00435817         9         4
10  2       282       0.00435817         10        5 

I am sure that there is a way around with dplyr, but I haven't been able to find a function that creates such a numerical order.

dput(mini)
    structure(list(appln_id = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2), numberclass = c(558L, 
    558L, 558L, 558L, 558L, 88L, 282L, 282L, 282L, 282L), weight = c(0.1, 
    0.1, 0.1, 0.1, 0.1, 0.00435816993464052, 0.00435816993464052, 
    0.00435816993464052, 0.00435816993464052, 0.00435816993464052
    ), order = 1:10), row.names = c(NA, -10L), class = c("data.table", 
    "data.frame"))
IceCreamToucan
  • 28,083
  • 2
  • 22
  • 38
Boletus
  • 81
  • 6

2 Answers2

2

A dplyr possibility:

df %>%
 group_by(appln_id) %>%
 mutate(numorder = row_number())

   appln_id numberclass  weight order numorder
      <dbl>       <int>   <dbl> <int>    <int>
 1        1         558 0.1         1        1
 2        1         558 0.1         2        2
 3        1         558 0.1         3        3
 4        1         558 0.1         4        4
 5        1         558 0.1         5        5
 6        2          88 0.00436     6        1
 7        2         282 0.00436     7        2
 8        2         282 0.00436     8        3
 9        2         282 0.00436     9        4
10        2         282 0.00436    10        5

Or:

df %>%
 group_by(appln_id) %>%
 mutate(numorder = 1:n())
tmfmnk
  • 38,881
  • 4
  • 47
  • 67
  • Why accept a `dplyr` answer when your data is obviously `data.table` and the other answer clearly gives you the same results? – r2evans Jun 28 '19 at 20:10
  • 1
    @r2evans have no idea, however, the OP tagged the question with `dplyr`. – tmfmnk Jun 28 '19 at 20:13
  • good point ... `fread`, then? – r2evans Jun 28 '19 at 20:14
  • 1
    I upvoted both answers, but since I wanted a dplyr answer I accepted the one formulated with dplyr. Both reach the same result but I cannot accept two answers, so.. – Boletus Jun 28 '19 at 20:17
1
mini[, numorder := seq_len(.N), by = "appln_id"]
#     appln_id numberclass     weight order numorder
#  1:        1         558 0.10000000     1        1
#  2:        1         558 0.10000000     2        2
#  3:        1         558 0.10000000     3        3
#  4:        1         558 0.10000000     4        4
#  5:        1         558 0.10000000     5        5
#  6:        2          88 0.00435817     6        1
#  7:        2         282 0.00435817     7        2
#  8:        2         282 0.00435817     8        3
#  9:        2         282 0.00435817     9        4
# 10:        2         282 0.00435817    10        5
r2evans
  • 141,215
  • 6
  • 77
  • 149
  • Great, let me get the code. I first go into mini by opening square brackets, then a comma because I am manipulating the columns. I create a new variable called numorder and then create a seq_len(.N) that is based on the char of "appln_id", right? – Boletus Jun 28 '19 at 19:34
  • The `seq_len(.N)` is executed once for each unique `appln_id`, and `.N` is the number of rows for that id. – r2evans Jun 28 '19 at 20:07