27

I am trying to produce a series of box plots in R that is grouped by 2 factors. I've managed to make the plot, but I cannot get the boxes to order in the correct direction.

My data farm I am using looks like this:

Nitrogen    Species    Treatment
2           G          L
3           R          M
4           G          H
4           B          L
2           B          M
1           G          H

I tried:

boxplot(mydata$Nitrogen~mydata$Species*mydata$Treatment)

this ordered the boxes alphabetically (first three were the "High" treatments, then within those three they were ordered by species name alphabetically).

alt text

I want the box plot ordered Low>Medium>High then within each of those groups G>R>B for the species.

So i tried using a factor in the formula:

f = ordered(interaction(mydata$Treatment, mydata$Species), 
            levels = c("L.G","L.R","L.B","M.G","M.R","M.B","H.G","H.R","H.B")

then:

boxplot(mydata$Nitrogen~f)

however the boxes are still shoeing up in the same order. The labels are now different, but the boxes have not moved.

I have pulled out each set of data and plotted them all together individually:

lg = mydata[mydata$Treatment="L" & mydata$Species="G", "Nitrogen"]
mg = mydata[mydata$Treatment="M" & mydata$Species="G", "Nitrogen"]
hg = mydata[mydata$Treatment="H" & mydata$Species="G", "Nitrogen"]
etc ..

boxplot(lg, lr, lb, mg, mr, mb, hg, hr, hb)

This gives what i want, but I would prefer to do this in a more elegant way, so I don't have to pull each one out individually for larger data sets.


Loadable data:

mydata <-
structure(list(Nitrogen = c(2L, 3L, 4L, 4L, 2L, 1L), Species = structure(c(2L, 
3L, 2L, 1L, 1L, 2L), .Label = c("B", "G", "R"), class = "factor"), 
    Treatment = structure(c(2L, 3L, 1L, 2L, 3L, 1L), .Label = c("H", 
    "L", "M"), class = "factor")), .Names = c("Nitrogen", "Species", 
"Treatment"), class = "data.frame", row.names = c(NA, -6L))
Old Pro
  • 24,624
  • 7
  • 58
  • 106
Robert
  • 271
  • 1
  • 3
  • 3
  • `boxplot(mydata$Nitrogen~mydata$Species*mydata$Treatment)` and `boxplot(mydata$Nitrogen~f)` produce two different plots for me, with the latter being ordered in the order you want. – Joshua Ulrich Nov 23 '10 at 20:55
  • top tip - use the code button (or indent by 4) to add code. It's cleaner than using and
    – Alex Brown Nov 23 '10 at 22:16

2 Answers2

33

The following commands will create the ordering you need by rebuilding the Treatment and Species factors, with explicit manual ordering of the levels:

mydata$Treatment = factor(mydata$Treatment,c("L","M","H"))

mydata$Species = factor(mydata$Species,c("G","R","B"))

alt text


edit 1 : oops I had set it to HML instead of LMH. fixing.

edit 2 : what factor(X,Y) does:

If you run factor(X,Y) on an existing factor, it uses the ordering of the values in Y to enumerate the values present in the factor X. Here's some examples with your data.

> mydata$Treatment
[1] L M H L M H
Levels: H L M
> as.integer(mydata$Treatment)
[1] 2 3 1 2 3 1
> factor(mydata$Treatment,c("L","M","H"))
[1] L M H L M H                               <-- not changed
Levels: L M H                                 <-- changed
> as.integer(factor(mydata$Treatment,c("L","M","H")))
[1] 1 2 3 1 2 3                               <-- changed

It does NOT change what the factor looks like at first glance, but it does change how the data is stored.

What's important here is that many plot functions will plot the lowest enumeration leftmost, followed by the next, etc.

If you create factors simply using factor(X) then usually the enumeration is based upon the alphabetical order of the factor levels, (e.g. "H","L","M"). If your labels have a conventional ordering different from alphabetical (i.e. "H","M","L"), this can make your graphs seems strange.

At first glance, it may seem like the problem is due to the ordering of data in the data frame - i.e. if only we could place all "H" at the top and "L" at the bottom, then it would work. It doesn't. But if you want your labels to appear in the same order as the first occurrence in the data, you can use this form:

 mydata$Treatment = factor(mydata$Treatment, unique(mydata$Treatment))
Alex Brown
  • 41,819
  • 10
  • 94
  • 108
  • See the edit to my post -- I'm not convinced this is actually true. – Dirk Eddelbuettel Nov 23 '10 at 21:27
  • 2
    changing the levels does not adjust the labels listed. However, it does affect the underlying enumeration of those labels. See my comment in your answer for more details. Note that the graph is now in the requested order. – Alex Brown Nov 23 '10 at 21:57
12

This earlier StackOverflow question shows how to reorder a boxplot based on a numerical value; what you need here is probably just a switch from factor to the related type ordered. But it is hard say as we do not have your data and you didn't provide a reproducible example.

Edit Using the dataset you posted in variable md and relying on the solution I pointed to earlier, we get

R> md$Species <- ordered(md$Species, levels=c("G", "R", "B"))
R> md$Treatment <- ordered(md$Treatment, levels=c("L", "M", "H"))
R> with(md, boxplot(Nitrogen ~ Species * Treatment))

which creates the chart you were looking to create.

This is also equivalent to the other solution presented here.

Community
  • 1
  • 1
Dirk Eddelbuettel
  • 360,940
  • 56
  • 644
  • 725
  • I found this example perfectly reproducible. Use the following command to load his data: `mydata=read.table(textConnection(scan(,"character",sep="\n")),head=TRUE)` Then paste his table data in, followed by `^D` – Alex Brown Nov 23 '10 at 21:09
  • ordered is not required here - an explicit *order* to the factor levels is. – Alex Brown Nov 23 '10 at 21:12
  • That doesn't make my question *wrong* but at worst *inefficient*. What's up with your downvote zealousness? – Dirk Eddelbuettel Nov 23 '10 at 21:14
  • Absolutely the data frame remained the same - it's **supposed to**. However, by changing the ordering of the levels in the factor, plot functions such as boxplot, lattice and ggplot plot the data in a different order onscreen. – Alex Brown Nov 23 '10 at 21:56
  • Alex, your comment on ordered is still wrong. Assigned an *ordered* factor variable to an `ordered` type is also clearer. – Dirk Eddelbuettel Nov 24 '10 at 02:44
  • 1
    @Dirk his comment is true for ggplot where a factor in the right order is needed not an ordered factor. – Brandon Bertelsen Nov 24 '10 at 08:00