51

when using dplyr function group_by() and immediately afterwards arrange(), I would expect to get an output where data frame is ordered within groups that I stated in group_by(). My reading of documentation is that this combination should produce such a result, however when I tried it this is not what I get, and googling did not indicate that other people ran into the same issue. Am I wrong in expecting this result?

Here is an example, using the R built-in dataset ToothGrowth:

library(dplyr)
ToothGrowth %>%
  group_by(supp) %>%
  arrange(len)

Running this will produce a data frame where the whole data frame is ordered according to len and not within supp factors.

This is the code that produces the desired output:

ToothGrowth %>%
  group_by(supp) %>%
  do( data.frame(with(data=., .[order(len),] )) )
Hrvoje
  • 513
  • 1
  • 4
  • 6
  • Can you file a bug report please? – hadley Jul 09 '14 at 23:51
  • Can you please link to this bug report? – Paul Rougieux Jan 20 '15 at 11:09
  • @Paul4forest The [issue](https://github.com/hadley/dplyr/issues/491) is closed, so either it is already in the current release or it is still in the development branch. – Hrvoje Jan 21 '15 at 15:59
  • @Hrvoje thanks for the link. According to Hadley's test case, `arrange()` sorts first by columns in the `group_by()` function and then by those in the `arrange()` function. test_that("grouped arrange sorts first by group", { df1 <- mtcars %>% group_by(cyl) %>% arrange(disp) %>% ungroup() df2 <- mtcars %>% arrange(cyl, disp) expect_equal(df1, df2) }) – Paul Rougieux Jan 22 '15 at 10:51
  • @PaulRougieux, I don't fully understand your reply. `mtcars %>% group_by(cyl) %>% arrange(disp) %>% ungroup()` produces the same undesired result as `mtcars %>% group_by(cyl) %>% arrange(disp)` It is not the same as `arrange(cyl,disp)` – user1700890 Dec 11 '16 at 22:34

3 Answers3

94

You can produce the expected behaviour by setting .by_group = TRUE in arrange:

library(dplyr)
ToothGrowth %>%
    group_by(supp) %>%
    arrange(len, .by_group = TRUE)
David Rubinger
  • 3,580
  • 1
  • 20
  • 29
  • 3
    FWIW this is the answer I was looking for given the stated question. – d8aninja Apr 10 '18 at 15:55
  • that helped me too ! – Dan Dec 19 '18 at 10:23
  • 1
    This should be the accepted answer. [This](https://stackoverflow.com/questions/33881405/how-to-sort-groups-within-sorted-groups) question especially needs an answer like this one. – Zimano Jan 24 '20 at 09:41
20

I think you want

ToothGrowth %>%
  arrange(supp,len)

The chaining system just replaces nested commands, so first you are grouping, then ordering that grouped result, which breaks the original ordering.

JeremyS
  • 3,497
  • 1
  • 17
  • 19
  • 4
    Thanks for the suggestion. Although it fixes my particular problem, I think it would not work in more general cases where you might want to preserve the ordering of original `supp` variable. – Hrvoje Jul 11 '14 at 12:55
  • then make supp a factor and specify the ordering using levels – JeremyS Jul 15 '14 at 02:12
  • 2
    I want to do exactly this, why can't it just work the way you think it should (i.e. group by first, then arrange) – Alex Sep 02 '14 at 23:35
  • I prefer it this way because it is more explicitly following directions. Before you would ask it to sort a variable and it would be insubordinate until you added an `ungroup()` that could snip away all the snags keeping the arrange from working as requested. – leerssej Dec 18 '16 at 05:01
2

Another way to fix this unexpected order issue while still using the group_by() statement is to convert the grouped_df back to a data frame. group_by is needed for summaries for example:

ToothGrowthMeanLen <-  ToothGrowth %>%
    group_by(supp, dose) %>%
    summarise(meanlen = mean(len)) 

This summary table is not arranged in the order of meanlen

ToothGrowthMeanLen %>%
    arrange(meanlen)

This summary table is arranged in the order of meanlen

ToothGrowthMeanLen %>%
    data.frame() %>%   # Convert to a simple data frame
    arrange(meanlen)

Converting grouped_df back to a data frame is the first way I found to sort a summarised data.frame. But in fact dplyr::ungroup exists for that purpose.

ToothGrowthMeanLen %>%
    ungroup() %>%   # Remove grouping
    arrange(meanlen)
Paul Rougieux
  • 10,289
  • 4
  • 68
  • 110