0

Overview

I have a two data frames called 'ANOVA.Dataframe.1' and 'ANOVA.Dataframe.2' (see below).

For this project, I have two aims:

(1) Fill the boxplots using the package RColorBrewer;

(2) Plot the boxplots using the package Cowplot

Issues

  • In the first instance, I generated two objects called New.filled.Boxplot.obs1.Canopy.Urban, and New.filled.Boxplot.obs2.Canopy.Urban, and I added the function (i.e function 1 or function 2 - see R-code below) that generated the boxplots with the function scale_fill_brewer(palette="Dark2") found in the RColorBrewer package by following this example to produce the desired results. However, my code did not work (see image below).

  • When I plotted the boxplots using plot_grid() in the Cowplot package, the positioning of the label headings (i.e A: Observation Period 1 or B: Observation Period 2 - see image below) overlay both boxplots (see image below). Is there a method to manipulate the plotting space in the plot window so the boxplots are very slightly smaller and the label headings are positioned above each boxplot instead?

If anyone can be of assistance, I would be deeply appreciative.

R-Code

library(tidyverse)
library(wrapr)
library(RColorBrewer)
library(dplyr)

# Open Colour Brewer Paletter Options
display.brewer.all()


## Function 1 to produce the boxplots for Dataframe 1

Boxplot.obs1.Canopy.Urban<-ANOVA.Dataframe.1 %.>%
                                   ggplot(data = ., aes(
                                   x = Urbanisation_index,
                                   y = Canopy_Index,
                                   group = Urbanisation_index,
                                   )) +
                                   stat_boxplot(
                                   geom = 'errorbar',
                                   width = .25
                                   ) +
                                   geom_boxplot(notch=T) +
                                   geom_line(
                                   data = group_by(., Urbanisation_index) %>%
                                   summarise(
                                   bot = min(Canopy_Index),
                                   top = max(Canopy_Index)
                                    ) %>%
                                   gather(pos, val, bot:top) %>% 
                                   select(
                                   x = Urbanisation_index,
                                   y = val
                                   ) %>%
                                   mutate(gr = row_number()) %>%
                                   bind_rows(
                                   tibble(
                                   x = 0,
                                   y = max(.$y) * 1.15,
                                   gr = 1:8
                                   )
                                   ),
                                  aes(
                                  x = x,
                                  y = y,
                                  group = gr
                                  )) +
                                  theme_light() +
                                  theme(panel.grid = element_blank()) +
                                  coord_cartesian(
                                  xlim = c(min(.$Urbanisation_index) - .5, max(.$Urbanisation_index) + .5),
                                  ylim = c(min(.$Canopy_Index) * .95, max(.$Canopy_Index) * 1.05)
                                   ) +
                                 ylab('Company Index (%)') +
                                 xlab('Urbanisation Index')

 ## Change the colours of the boxplot
New.filled.Boxplot.obs1.Canopy.Urban <- Boxplot.obs1.Canopy.Urban + scale_fill_brewer(palette="Dark2")

 

## Function 2 to produce the boxplots for Dataframe 2
Boxplot.obs2.Canopy.Urban<-ANOVA.Dataframe.2 %.>%
                                   ggplot(data = ., aes(
                                   x = Urbanisation_index,
                                   y = Canopy_Index,
                                   group = Urbanisation_index,
                                   )) +
                                   stat_boxplot(
                                   geom = 'errorbar',
                                   width = .25
                                   ) +
                                   geom_boxplot(notch=T) +
                                   geom_line(
                                   data = group_by(., Urbanisation_index) %>%
                                   summarise(
                                   bot = min(Canopy_Index),
                                   top = max(Canopy_Index)
                                    ) %>%
                                   gather(pos, val, bot:top) %>% 
                                   select(
                                   x = Urbanisation_index,
                                   y = val
                                   ) %>%
                                   mutate(gr = row_number()) %>%
                                   bind_rows(
                                   tibble(
                                   x = 0,
                                   y = max(.$y) * 1.15,
                                   gr = 1:8
                                   )
                                   ),
                                  aes(
                                  x = x,
                                  y = y,
                                  group = gr
                                  )) +
                                  theme_light() +
                                  theme(panel.grid = element_blank()) +
                                  coord_cartesian(
                                  xlim = c(min(.$Urbanisation_index) - .5, max(.$Urbanisation_index) + .5),
                                  ylim = c(min(.$Canopy_Index) * .95, max(.$Canopy_Index) * 1.05)
                                   ) +
                                 ylab('Company Index (%)') +
                                 xlab('Urbanisation Index')


## Change the colours of the boxplot

 New.filled.Boxplot.obs2.Canopy.Urban<- Boxplot.obs2.Canopy.Urban + scale_fill_brewer(palette="Dark2")

 

library(cowplot)

## Open New plot window
dev.new()

Combined_boxplot_Obs<-plot_grid(New.filled.Boxplot.obs1.Canopy.Urban, 
                                New.filled.Boxplot.obs2.Canopy.Urban, 
                                labels=c("A: Observation Period 1",
                                         "B: Observation Period 2"),
                                label_fontface="bold",
                                label_fontfamily="Times New Roman",
                                label_size=12,
                                align="v",
                                ncol=2, nrow=1)

Combined_boxplot_Obs

This R-code produces these plots:

enter image description here

Data frame 1

structure(list(Urbanisation_index = c(2, 2, 4, 4, 3, 3, 4, 4, 
4, 2, 4, 3, 4, 4, 1, 1, 1, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 2, 
2, 2, 2, 4, 4, 3, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 1, 4, 4, 4, 
4, 4, 4, 4), Canopy_Index = c(65, 75, 55, 85, 85, 85, 95, 85, 
85, 45, 65, 75, 75, 65, 35, 75, 65, 85, 65, 95, 75, 75, 75, 65, 
75, 65, 75, 95, 95, 85, 85, 85, 75, 75, 65, 85, 75, 65, 55, 95, 
95, 95, 95, 45, 55, 35, 55, 65, 95, 95, 45, 65, 45, 55)), row.names = c(NA, 
-54L), class = "data.frame")

Dataframe 2

structure(list(Urbanisation_index = c(2, 2, 4, 4, 3, 3, 4, 4, 
4, 3, 4, 4, 4, 4, 1, 1, 1, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 2, 
2, 2, 2, 4, 4, 3, 2, 2, 2, 2, 2, 2, 1, 1, 4, 4, 4, 4, 4, 4, 4
), Canopy_Index = c(5, 45, 5, 5, 5, 5, 45, 45, 55, 15, 35, 45, 
5, 5, 5, 5, 5, 5, 35, 15, 15, 25, 25, 5, 5, 5, 5, 5, 5, 15, 25, 
15, 35, 25, 45, 5, 25, 5, 5, 5, 5, 55, 55, 15, 5, 25, 15, 15, 
15, 15)), row.names = c(NA, -50L), class = "data.frame")
Alice Hobbs
  • 1,021
  • 1
  • 15
  • 31
  • I'm afraid your code isn't producing anything. Please run it in a fresh R session and correct the issues in your question. – jay.sf Feb 02 '19 at 09:56
  • Hey, jay.sf, I re-ran my code with both data frames, and the code ran smoothly with no problems. However, I did a fresh copy and pasting job just in case some code did not copy over properly when I was writing this question. Does the code work for you now? Many thanks for responding, it means a lot! – Alice Hobbs Feb 02 '19 at 11:25
  • Still gettin' `Error in ANOVA.Dataframe.1 %.>% ggplot(data = ., aes(x = Urbanisation_index, : could not find function "%.>%"`. – jay.sf Feb 02 '19 at 12:01
  • Hi jay.sf, I think I might understand what is happening. Piping is associated with library(dplyr) or library(plyr). Try opening these packages and see if it makes a difference. My code definitely works. Thank you! – Alice Hobbs Feb 02 '19 at 12:05
  • *"Piping is associated with library(dplyr) or library(plyr). Try opening these packages"* - don't you think you demand a lot from those who are supposed to help you? – jay.sf Feb 02 '19 at 12:10
  • Hey jay,sf, I was just trying to provide a helpful solution because I already had the dplyr package open, and perhaps this is why this code is working for me. I had exactly the same error message with previous code and another kind person on StackOverflow advised me to open dplyr and then it worked. I am sorry if you felt offended. I added library(dplyr) to the code above, just in case. I do really appreciate your help, thank you! – Alice Hobbs Feb 02 '19 at 12:21

1 Answers1

2
  1. The scale_fill_brewer(palette = "Dark2") does not work in your example, because you don not provide a fill-aesthetics. You need to add that to your boxplot.
  2. The labels in plot_grid are meant to be single letters (or at least short) for reference in a caption. For your purpose I'd recommend to use titles in the original plots.
  3. Your code is quite hard to read and you can reduce the number of packages used. I also shortend the name as they are not so important here and make everything more verbose.
  4. I would calculate special statistics not inside the ggplot-call, but before that in a separate data.frame.

Packages

library(tidyverse)
library(cowplot)

1st Boxplots

# Calculate special positions for lines first
mydf.1.lines <- mydf.1 %>% 
  group_by(Urbanisation) %>%
  summarise(bot = min(Canopy), top = max(Canopy)) %>%
  gather(pos, val, bot:top) %>% 
  select(x = Urbanisation, y = val) %>%
  mutate(gr = row_number()) %>%
  bind_rows(tibble(x = 0, y = max(.$y) * 1.15, gr = 1:8))

# Calculate plot limits 
xlimits.1 <- with(mydf.1, c(min(Urbanisation) - .5, max(Urbanisation) + .5))
ylimits.1 <- with(mydf.1, c(min(Canopy) * .95, max(Canopy) * 1.05))

Boxplot.1 <- 
  ggplot(mydf.1, aes(Urbanisation, Canopy, group = Urbanisation)) +
  stat_boxplot(geom = 'errorbar', width = .25) +
  # Add a fill aesthetics in the geom_boxplot - call:
  geom_boxplot(aes(fill = factor(Urbanisation)), notch = TRUE) +
  geom_line(data = mydf.1.lines, 
            aes(x, y, group = gr)) +
  theme_light() +
  theme(panel.grid = element_blank()) +
  coord_cartesian(xlim = xlimits.1, ylim = ylimits.1) +
  ylab('Company Index (%)') +
  xlab('Urbanisation Index')

New.filled.Boxplot.1 <- Boxplot.1 + scale_fill_brewer(palette = "Dark2")

2nd Boxplots
Analogous to the 1st:

mydf.2.lines <- mydf.2 %>% 
  group_by(Urbanisation) %>%
  summarise(bot = min(Canopy), top = max(Canopy)) %>%
  gather(pos, val, bot:top) %>% 
  select(x = Urbanisation, y = val) %>%
  mutate(gr = row_number()) %>%
  bind_rows(tibble(x = 0, y = max(.$y) * 1.15, gr = 1:8))

xlimits.2 <- with(mydf.2, c(min(Urbanisation) - .5, max(Urbanisation) + .5))
ylimits.2 <- with(mydf.2, c(min(Canopy) * .95, max(Canopy) * 1.05))

Boxplot.2 <- 
  ggplot(mydf.2, aes(Urbanisation, Canopy, group = Urbanisation)) +
  stat_boxplot(geom = 'errorbar', width = .25) +
  geom_boxplot(aes(fill = factor(Urbanisation)), notch = TRUE) +
  geom_line(data = mydf.2.lines, 
            aes(x, y, group = gr)) +
  theme_light() +
  theme(panel.grid = element_blank()) +
  coord_cartesian(xlim = xlimits.2, ylim = ylimits.2) +
  ylab('Company Index (%)') +
  xlab('Urbanisation Index')

New.filled.Boxplot.2 <- Boxplot.2 + scale_fill_brewer(palette = "Dark2")

Combine Plots

plot_grid(New.filled.Boxplot.1 + ggtitle("A: Observation Period 1"),
          New.filled.Boxplot.2 + ggtitle("B: Observation Period 2"), 
          align = "v",
          ncol = 2,
          nrow = 1)

Or with the correct specification of the title and hjust (Thanks to Claus Wilke):

plot_grid(New.filled.Boxplot.1 + ggtitle(""),
          New.filled.Boxplot.2 + ggtitle(""), 
          align = "v",
          labels = c("A: Observation Period 1", "B: Observation Period 2"),
          hjust = 0, 
          label_x = 0.01,
          ncol = 2,
          nrow = 1)

enter image description here

Boxplot outside of plot
The problem here is that the notches are outside the hinges. If you set notch = FALSE for the second plot (or both) it is no problem. Alternatively you could also manipulate the ylimits as you already suggested. The function with simply specifies the data.frame (mydf.2) in which the following columns can be found. Thus the call

ylimits.2 <- with(mydf.2, c(min(Canopy) * .95, max(Canopy) * 1.05))

is equivalent to

ylimits.2 <- c(min(mydf.2$Canopy) * .95, max(mydf.2$Canopy) * 1.05)

and you could for example specify

ylimits.2 <- c(-20, max(mydf.2$Canopy) * 1.05)

This would set the lower limit to -20 and the upper limit to 1.05 times the maximum of the Canopy index in the second dataframe.

Data

mydf.1 <- 
  structure(list(Urbanisation = c(2, 2, 4, 4, 3, 3, 4, 4, 4, 2, 4, 3, 4, 4, 1, 
                                  1, 1, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 2, 2, 
                                  2, 2, 4, 4, 3, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 
                                  2, 1, 4, 4, 4, 4, 4, 4, 4), 
                 Canopy = c(65, 75, 55, 85, 85, 85, 95, 85, 85, 45, 65, 75, 75, 
                            65, 35, 75, 65, 85, 65, 95, 75, 75, 75, 65, 75, 65, 
                            75, 95, 95, 85, 85, 85, 75, 75, 65, 85, 75, 65, 55, 
                            95, 95, 95, 95, 45, 55, 35, 55, 65, 95, 95, 45, 65, 
                            45, 55)), 
            row.names = c(NA, -54L), class = "data.frame")

mydf.2 <- 
  structure(list(Urbanisation = c(2, 2, 4, 4, 3, 3, 4, 4, 4, 3, 4, 4, 4, 4, 1, 
                                  1, 1, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 2, 2, 
                                  2, 2, 4, 4, 3, 2, 2, 2, 2, 2, 2, 1, 1, 4, 4, 
                                  4, 4, 4, 4, 4), 
                 Canopy = c(5, 45, 5, 5, 5, 5, 45, 45, 55, 15, 35, 45, 5, 5, 5, 
                            5, 5, 5, 35, 15, 15, 25, 25, 5, 5, 5, 5, 5, 5, 15, 
                            25, 15, 35, 25, 45, 5, 25, 5, 5, 5, 5, 55, 55, 15, 
                            5, 25, 15, 15, 15, 15)), 
            row.names = c(NA, -50L), class = "data.frame")
kath
  • 7,624
  • 17
  • 32
  • It's fine to use longer labels in `plot_grid()` as long as you adjust `hjust` accordingly. See here: https://stackoverflow.com/a/47724512/4975218 – Claus Wilke Feb 02 '19 at 17:24
  • Hi Kath, thank you so much for your help, I deeply appreciate your input. One question: the boxplots contained in boxplot 2 or B: Observation Period 2 are not fully contained within the plot, indicating I will need to manipulate the ylim limits. Could I please ask your advice for the interpretation of this line of code? I understand the basic notation for ylim (min, max), but what does the notation in this line of code imply? ylimits.2 <- with(mydf2, c(min(Canopy_Index) * .95, max(Canopy_Index) * 1.05)). You have been brilliant! Have a good day and take care – Alice Hobbs Feb 03 '19 at 04:28
  • @ClausWilke Thanks for pointing to this. I was struggling with the long labels as I did not set the title to be empty (`ggtitle("")`) and then the labels still overlapped the plots. Thanks for the great package!! – kath Feb 03 '19 at 17:13
  • @AliceHobbs please see my edit. I hope this clarifies the code. – kath Feb 03 '19 at 17:23