34

I'd like to use R to make a series of boxplots which are sorted by median value. Suppose then I execute:

boxplot(cost ~ type)

This would give me some boxplots were cost is shown on the y axis and the type category is visible on the x-axis:

-----     -----
  |         |
 [ ]        |
  |        [ ]
  |         |
-----     -----
  A         B

However, what I'd like is the boxplot figures sorted from highest to lowest median value. My suspicion is that what I need to do is change the labels of the type (A or B) to numerically indicate which is the lowest and highest median value, but I wonder if there is a more clever way to solve the problem.

zx8754
  • 52,746
  • 12
  • 114
  • 209
speciousfool
  • 2,620
  • 5
  • 28
  • 33

3 Answers3

49

Check out ?reorder. The example seems to be what you want, but sorted in the opposite order. I changed -count in the first line below to sort in the order you want.

  bymedian <- with(InsectSprays, reorder(spray, -count, median))
  boxplot(count ~ bymedian, data = InsectSprays,
          xlab = "Type of spray", ylab = "Insect count",
          main = "InsectSprays data", varwidth = TRUE,
          col = "lightgray")
Joshua Ulrich
  • 173,410
  • 32
  • 338
  • 418
13

Yes, that is the idea:

> set.seed(42)                     # fix seed       
> DF <- data.frame(type=sample(LETTERS[1:5], 100, replace=TRUE), 
+                  cost=rnorm(100)) 
>
> boxplot(cost ~ type, data=DF)    # not ordered by median
>
> # compute index of ordered 'cost factor' and reassign          
> oind <- order(as.numeric(by(DF$cost, DF$type, median)))    
> DF$type <- ordered(DF$type, levels=levels(DF$type)[oind])   
>
> boxplot(cost ~ type, data=DF)    # now it is ordered by median
Dirk Eddelbuettel
  • 360,940
  • 56
  • 644
  • 725
  • can you please take a look a my question? https://stackoverflow.com/questions/65547448/r-for-the-same-code-labels-q1-median-appear-on-one-computer-but-dont-appea thanks! – stats_noob Jan 04 '21 at 21:57
0

Beware of missing values, you have to add na.rm = TRUE for it to work. If not, the code simply doesn't work. It took me hours to found that out.

  bymedian <- with(InsectSprays, reorder(spray, -count, median, **na.rm = TRUE**)
  boxplot(count ~ bymedian, data = InsectSprays,
          xlab = "Type of spray", ylab = "Insect count",
          main = "InsectSprays data", varwidth = TRUE,
          col = "lightgray")
agrm
  • 3,735
  • 4
  • 26
  • 36
  • 2
    You should specify that this refers to [Joshua Ulrich's answer](http://stackoverflow.com/a/3766007/3982001). It should actually be a comment, but it can also stand on its own as a separate answer. – Fabio says Reinstate Monica Aug 24 '16 at 14:10
  • I flagged it as "not an answer" as exact same answer is posted (and accepted). User just added new argument. This doesn't improve quality of the solution and is not sufficient to be a separate answer. – pogibas Dec 12 '17 at 18:11