0

How do I number ordered grouped data using dplyr e.g. how would I create the ordering column below.

   Sepal.Length Sepal.Width Petal.Length Petal.Width    Species ordering
          (dbl)       (dbl)        (dbl)       (dbl)     (fctr)    (int)
 1           4.6         3.6          1.0         0.2     setosa        1
 2           4.3         3.0          1.1         0.1     setosa        2
 3           5.8         4.0          1.2         0.2     setosa        3
 4           5.0         3.2          1.2         0.2     setosa        4
 5           4.7         3.2          1.3         0.2     setosa        5
...
46           5.7         3.8          1.7         0.3     setosa       46
47           5.4         3.4          1.7         0.2     setosa       47
48           5.1         3.3          1.7         0.5     setosa       48
49           4.8         3.4          1.9         0.2     setosa       49
50           5.1         3.8          1.9         0.4     setosa       50
51           5.1         2.5          3.0         1.1 versicolor        1
52           4.9         2.4          3.3         1.0 versicolor        2
53           5.0         2.3          3.3         1.0 versicolor        3
54           5.0         2.0          3.5         1.0 versicolor        4
55           5.7         2.6          3.5         1.0 versicolor        5
...
99           6.7         3.0          5.0         1.7 versicolor       49
100          6.0         2.7          5.1         1.6 versicolor       50
101          4.9         2.5          4.5         1.7  virginica        1
102          6.2         2.8          4.8         1.8  virginica        2

...

James Owers
  • 7,948
  • 10
  • 55
  • 71
  • Hi @Henrik, have edited to specifically reference `dplyr` to differentiate from the duplicate question. I wrote this up because my google search and stackoverflow search was unfruitful. – James Owers Dec 11 '15 at 15:36
  • Hi @kungfujam Did you read the Q&A I linked to? Please see [my `dplyr` answer there](http://stackoverflow.com/questions/11996135/create-a-unique-sequential-number-for-each-row-within-each-group-subset-of-a-da/30596227#30596227). – Henrik Dec 11 '15 at 15:47
  • If by Q&A you mean the [linked question](http://stackoverflow.com/questions/11996135/create-a-unique-sequential-number-for-each-row-within-each-group-subset-of-a-da) then yes I saw this and your answer. Thank you for this, I wouldn't have found out about `row_number()` otherwise. As far as SO goes, what is the correct etiquette at this point? I couldn't find your answer before; I've edited my question (and now title) to reflect what I was looking for which I believe is a separate question. Your answer is applicable to both questions. – James Owers Dec 11 '15 at 15:52
  • You can just keep your question here. It will serve as a pointer to a 'more canonical' Q&A (Question and Answer). Someone might use search terms which leads them to your question, the link to the duplicate will then lead them to the answer. I assume you have visited the help center and read [this](http://stackoverflow.com/help/duplicates). – Henrik Dec 11 '15 at 16:01
  • You assume incorrectly. Thank you very much for the pointer. I note that your answer on the previous question is not accepted. I'm very happy to "give way" on this question, as it were, and accept your answer here, seeing as you were first (and given your efforts). – James Owers Dec 11 '15 at 16:03
  • I try to follow the [recommendations on duplicate questions](http://meta.stackexchange.com/questions/10841/how-should-duplicate-questions-be-handled): "Should I answer [a duplicate]? No, not if you think it's a duplicate. If you don't think the answers on the duplicate question are good enough, write an answer there.". I think this is a sound policy - try to create canonical Q&As and try to keep the signal-noise ratio high. As you see, the dupe I linked to was posted 3 years ago, and I added my `dplyr` answer there quite recently. Cheers. – Henrik Dec 14 '15 at 16:33
  • Cool beans dude. Cheers for the education. – James Owers Dec 14 '15 at 17:04
  • Nu problem! Good luck. – Henrik Dec 14 '15 at 17:54

1 Answers1

0
iris %>% group_by(Species) %>% arrange(Petal.Length) %>% mutate(ordering = 1:n())

or even better

iris %>% group_by(Species) %>% arrange(Petal.Length) %>% mutate(ordering = row_number())

Note that n() returns the total number per group so 1:n() returns the vector c(1,2,3,..., n() ). row_number() performs this operation for you.

You can use a more complicated grouping e.g. to group mtcars by number of cylinders, number of gears, and number of carburetors, and number the order with respect to miles per gallon (select() for display clarity):

mtcars %>% group_by(cyl, gear, carb) %>% 
    arrange(mpg) %>%
    mutate(ordering = 1:n()) %>% 
    select(cyl, gear, carb, mpg, ordering)
Source: local data frame [32 x 5]
Groups: cyl, gear, carb [12]

     cyl  gear  carb   mpg ordering
   (dbl) (dbl) (dbl) (dbl)    (int)
1      4     3     1  21.5        1
2      4     4     1  22.8        1
3      4     4     1  27.3        2
4      4     4     1  32.4        3
5      4     4     1  33.9        4
6      4     4     2  21.4        1
7      4     4     2  22.8        2
8      4     4     2  24.4        3
9      4     4     2  30.4        4
10     4     5     2  26.0        1
..   ...   ...   ...   ...      ...
James Owers
  • 7,948
  • 10
  • 55
  • 71