28

Hi I usually use some code like the following to reorder bars in ggplot or other types of plots.

Normal plot (unordered)

library(tidyverse)
iris.tr <-iris %>% group_by(Species) %>% mutate(mSW = mean(Sepal.Width)) %>%
  select(mSW,Species) %>% 
  distinct()
ggplot(iris.tr,aes(x = Species,y = mSW, color = Species)) +
  geom_point(stat = "identity")

Ordering the factor + ordered plot

iris.tr$Species <- factor(iris.tr$Species,
                          levels = iris.tr[order(iris.tr$mSW),]$Species,
                          ordered = TRUE)
ggplot(iris.tr,aes(x = Species,y = mSW, color = Species)) + 
  geom_point(stat = "identity")

The factor line is extremely unpleasant to me and I wonder why arrange() or some other function can't simplify this. I am missing something?

Note:

This do not work but I would like to know if something like this exists in the tidyverse.

iris.tr <-iris %>% group_by(Species) %>% mutate(mSW = mean(Sepal.Width)) %>%
  select(mSW,Species) %>% 
  distinct() %>% 
  arrange(mSW)
ggplot(iris.tr,aes(x = Species,y = mSW, color = Species)) + 
  geom_point(stat = "identity")
Mark
  • 7,785
  • 2
  • 14
  • 34
David Mas
  • 1,149
  • 2
  • 12
  • 18
  • 4
    Careful: you shouldn’t use `.` inside identifiers because it has a specific meaning when using S3 dispatch (use `_` instead); and you shouldn’t use `T` for `TRUE`, since it’s not a reserved word and can be redefined (`T = FALSE` for the naughty). – Konrad Rudolph Jul 17 '17 at 16:15
  • Maybe I am completely wrong but I thought that was the correct way to name identifiers in R . I saw it in the [Google's R Style Guide](https://google.github.io/styleguide/Rguide.xml#identifiers) – David Mas Jul 17 '17 at 16:24
  • 1
    Google’s style guides are generally a bit crap. Ignore them. Here’s a better style guide for R: http://style.tidyverse.org/ — I disagree with some of the points (capital letters in filenames?! what. the. heck.) but it’s definitely acceptable and widely used in R. – Konrad Rudolph Jul 17 '17 at 16:25
  • Okay, looks interesting I'll have a look! EDIT: Changed T for TRUE – David Mas Jul 17 '17 at 16:27

4 Answers4

24

Using ‹forcats›:

iris.tr %>%
    mutate(Species = fct_reorder(Species, mSW)) %>%
    ggplot() +
    aes(Species, mSW, color = Species) +
    geom_point()
Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
  • What's the difference between `fct_reorder` and `reorder`? – Julien Sep 09 '22 at 11:34
  • 1
    @Julien It's mostly very similar but (like most r-lib and tidyverse functions) it performs stricter error-checking and is and thus more robust. I wouldn't *generally* use `fct_reorder` over `reorder`, except OP is *already* using the tidyverse packages, and in that scenario I recommend always using the replacements of core R functionality, the modern reimplementations are just that little bit better. – Konrad Rudolph Sep 09 '22 at 12:08
13

Reordering the factor using base:

iris.ba = iris
iris.ba$Species = with(iris.ba, reorder(Species, Sepal.Width, mean))

Translating to dplyr:

iris.tr = iris %>% mutate(Species = reorder(Species, Sepal.Width, mean))

After that, you can continue on to summarize and plot as in your question.


A couple comments: reordering a factor is modifying a data column. The dplyr command to modify a data column is mutate. All arrange does is re-order rows, this has no effect on the levels of the factor and hence no effect on the order of a legend or axis in ggplot.

All factors have an order for their levels. The difference between an ordered = TRUE factor and a regular factor is how the contrasts are set up in a model. ordered = TRUE should only be used if your factor levels have a meaningful rank order, like "Low", "Medium", "High", and even then it only matters if you are building a model and don't want the default contrasts comparing everything to a reference level.

Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
  • Great to know about reorder. So what's the difference between `reorder()` and `ordered=TRUE`? I mean, do they modify different elements of the factor or something like that? – David Mas Jul 17 '17 at 16:33
  • I see it uses an attributes with a scores vector. Got it! – David Mas Jul 17 '17 at 16:35
  • `reorder` changes the order of the levels of a factor (or ordered factor). As I said, *all factors have an order for their levels*. `factor(..., ordered = TRUE)` creates an object with class "ordered" and "factor", which has special behavior for contrasts and for some comparisons. (Contrasts, as I mention in the question, and you can use `<` or `>` to compare elements.) Unless you need those special behaviors, there is no need for `ordered = TRUE`. – Gregor Thomas Jul 17 '17 at 16:38
2

If you happen to have a character vector to order, for example:

iris2 <- iris %>% 
    mutate(Species = as.character(Species)) %>% 
    group_by(Species) %>% 
    mutate(mean_sepal_width = mean(Sepal.Width)) %>% 
    ungroup()

You can also order the factor level using the behavior of the forcats::as_factor function :

"Compared to base R, this function creates levels in the order in which they appear"

library(forcats)
iris2 %>% 
    # Change the order
    arrange(mean_sepal_width) %>%  
    # Create factor levels in the order in which they appear
    mutate(Species = as_factor(Species)) %>%
    ggplot() +
    aes(Species, Sepal.Width, color = Species) +
    geom_point()

Notice how the species names on the x axis are not ordered alphabetically but by increasing value of their mean_sepal_width. Remove the line containing as_factor to see the difference.

Paul Rougieux
  • 10,289
  • 4
  • 68
  • 110
-1

In case you'd like to order levels manually: You can do so also with forcats using https://forcats.tidyverse.org/reference/fct_relevel.html

Holger Brandl
  • 10,634
  • 3
  • 64
  • 63