2

I'm new to ggplot and have a problem with plotting errorbars in a barplot. A minimal working example looks like this:

abun_all <- data.frame("Tree.genus" = c(rep("Acer", 5), rep("Betula", 5), rep("Larix", 5), rep("Picea", 5), rep("Pinus", 5), rep("Quercus", 5)),
               "P.sampled" = c(sample(c(seq(from = 0.001, to = 0.06, by = 0.0005)), 30)),
               "Insects.sampled" = c(sample(c(seq(from = 1.667, to = 533, by = 1.335)), 30)),
               "Category" = as.factor(c(sample(c(seq(from = 1, to = 3, by = 1)), 30, replace = T))),
               "P.sampled_mean" = c(sample(c(seq(from = 0.006, to = 0.178, by = 0.0005)), 30)),
               "P.sampled_sd" = c(sample(c(seq(from = 0.004, to = 0.2137, by = 0.0005)), 30)))

ggplot(data = abun_all, aes(x = as.factor(Tree.genus), y = P.sampled , fill = Category)) +
geom_bar(stat = "identity", position = position_dodge(1)) +
geom_errorbar(aes(ymin = P.sampled - (P.sampled_mean+P.sampled_sd), ymax = P.sampled + (P.sampled_mean+P.sampled_sd)), width = 0.1, position = position_dodge(1)) + scale_fill_discrete(name = "Category",
                  breaks = c(1, 2, 3),
                  labels = c("NrAm in SSM", "NrAm in FR", "Eurp in FR")) +
xlab("Genus") + ylab("No. of Focus sp. per total insect abundance")

NOTE : The values are just random and do not represent the actual data but should suffice to demonstrate the problem !

The problem seems to be that errorbars are plotted for the number of entires of each Tree.genus per Category. How can I get this to work ?

Edit: I created another Df by hand with just the max values of each P.sampled combination and now the plot looks the way I want it (except for the two missing errorbars).

abun_plot <- data.frame("Tree.genus" = rep(genera, each = 3),
                      "P.sampled" = c(0.400000000, 0.100000000, 0.500000000, 0.200000000, 0.100000000, 0.042857143, 0.016666667, 0.0285714286, 0.0222222222, 0.020000000, 0, 0.010000000, 0.060000000, 0.025000000, 0.040000000, 0.250000000, 0.150000000, 0.600000000),
                      "Category" = as.factor(rep(c(1,2,3), 3)),
                      "P.sampled_SD" = as.numeric(c(0.08493057, 0.02804758, 0.19476489, 0.04533747, 0.02447665, 0.01308939, 0.004200168, "NA", 0.015356359, 0.005724859, "NA", "NA", 0.01633612, 0.01013794, 0.02045931, 0.07584737, 0.05760980, 0.21374053)),
                      "P.sampled_Mean" = as.numeric(c(0.07837134, 0.05133333, 0.14089286, 0.04537983, 0.02686200, 0.01680721, 0.005833333, 0.028571429, 0.011363636, 0.01101331, "NA", 0.01000000, 0.02162986, 0.01333333, 0.01668582, 0.08705221, 0.04733333, 0.17870370)))

ggplot(data = abun_plot, aes(x = as.factor(Tree.genus), y = P.sampled , fill = Category)) +
geom_bar(stat = "identity", position = position_dodge(1)) +
geom_errorbar(aes(ymin = P.sampled - P.sampled_SD, ymax = P.sampled + P.sampled_SD), width = 0.1, position = position_dodge(1)) +
scale_fill_discrete(name = "Category",
                    breaks = c(1, 2, 3),
                    labels = c("NrAm in SSM", "NrAm in FR", "Eurp in FR")) +
xlab("Genus") + ylab("No. of Focus sp. per total insect abundance")

Since doing this by hand takes a lot of time and several other plots have the same problem, I would prefer working with the original df (abun_all). Can I just subset my df in the ggplot() function to get the desired output ?

L.Thoma
  • 23
  • 6
  • Seems like the issue is that you have multiple observations of some combinations of genus and category. What's your plan for plotting them? Right now they're just being laid in front of each other—to see what I mean, add something like `color = "white"` to your `geom_bar` – camille Oct 03 '18 at 13:22
  • The plan is to have a plot of the range of P.sampled value for each Tree.genus per Category including an errorbar. I get what you mean though. Thinking about it again, the top of each bar represents the maximum value, so it should be enough to plot the max of each of the combinations. Does that makes sense ? – L.Thoma Oct 03 '18 at 13:52
  • That makes sense if what you want is to plot just the maximum for each, sure. It depends on what you need and the context in which you're presenting the data, but you could filter for just the max of each combination. Maybe you could sketch out the output you're looking for – camille Oct 03 '18 at 14:25
  • See the Edit for an example with the desired plot – L.Thoma Oct 03 '18 at 14:45

1 Answers1

1

Since you want to just show the maximum value for each combination of genus and category, you can use a couple of dplyr functions (in the tidyverse alongside ggplot2) to group by both genus and category, then take the top value for each. That way, you aren't building abun_plot by hand the way you did in the second block.

library(dplyr)
library(ggplot2)

abun_plot <- abun_all %>%
  group_by(Tree.genus, Category) %>%
  top_n(1, P.sampled_mean)

head(abun_plot)
#> # A tibble: 6 x 6
#> # Groups:   Tree.genus, Category [6]
#>   Tree.genus P.sampled Insects.sampled Category P.sampled_mean P.sampled_sd
#>   <fct>          <dbl>           <dbl> <fct>             <dbl>        <dbl>
#> 1 Acer          0.041            295.  3                0.0125       0.044 
#> 2 Acer          0.044             81.8 1                0.166        0.037 
#> 3 Acer          0.0085           379.  2                0.155        0.134 
#> 4 Betula        0.0505           183.  2                0.170        0.0805
#> 5 Betula        0.0325            61.7 3                0.0405       0.0995
#> 6 Betula        0.0465           326.  1                0.0985       0.188

After that, the plotting works as you initially expected:

ggplot(data = abun_plot, aes(x = as.factor(Tree.genus), y = P.sampled , fill = Category)) +
  geom_col(position = position_dodge(1)) +
  geom_errorbar(aes(ymin = P.sampled - P.sampled_sd, ymax = P.sampled + P.sampled_sd), width = 0.1, position = position_dodge(1)) +
  scale_fill_discrete(name = "Category",
                      breaks = c(1, 2, 3),
                      labels = c("NrAm in SSM", "NrAm in FR", "Eurp in FR")) +
  xlab("Genus") + ylab("No. of Focus sp. per total insect abundance")

It's also worth noting that as of a few releases back of ggplot2, you can use geom_col() in place of geom_bar(stat = "identity").

Created on 2018-10-03 by the reprex package (v0.2.1)

camille
  • 16,432
  • 18
  • 38
  • 60