2

I am trying to show percentual change between groups. As a base approach, I wanted to compare the all groups' means to a defaut value, and just show those differences in [%]. I wish however to somehow represent the variability within the means, let's say in terms of standard deviations. But, I am not sure how to show this and actually if it does make sense?

Here is my example of simple bar plotting of the means and error bars, calculated as mean and sd:

dd <- data.frame(id = rep(c(1,2,3), 2),
                 vol = c(10,5,8,11,10,9),
                 reg = rep(c('control', 'new'), each = 3))


# calculate mean and sd
sum_dd <- dd %>% 
  group_by(reg) %>% 
  summarize(V_mean = mean(vol, na.rm = T),
            V_sd = sd(vol, na.rm = T)) #

# Plot bar plot and error bars
sum_dd %>%
  ggplot(aes(x = reg,
             y = V_mean)) +
  geom_bar(stat = 'identity') +
  geom_errorbar(aes(x=reg,
                    min=V_mean-V_sd, 
                    ymax=V_mean+V_sd)) # 

This plotting generates nice bar plot with error bars, wherw it is obvious what are my means and sd for each group:

enter image description here

But, how to express that the group new is a % change from control?

Here, I need to first calcualte the % change from control to new. Then I can plot the bar of the percentual change. But from which values I can calculate something like standard deviations to plot and show the variability (using e.g., error bar) in my results ?

sum_dd %>% 
  group_by(reg) %>% 
  # Calculate % change from a to b value
  mutate(control_mean   = 7.67,
         perc_change = (10-7.67)/7.67 * 100) %>%
  filter(reg !='control') %>% 
  ggplot(aes(x = reg,
             y = perc_change)) +
  geom_bar(stat = 'identity') #+
# from which values calculate the error bar??
  geom_errorbar(aes(x=reg,
                    min=V_mean-V_sd, 
                    ymax=V_mean+V_sd)) # +

Thanks for your thoughts!

Dan Adams
  • 4,971
  • 9
  • 28
maycca
  • 3,848
  • 5
  • 36
  • 67

1 Answers1

1

First of all, you can get your original plot using stat_summary() more easily because it will calculate the mean and SD for you directly inside the ggplot() call.

But to your question, you easily calculate the fold change prior to passing to ggplot() by doing a mutate() where you set vol[reg == "control"] as the denominator. Then you can format the y axis using {scales}.

library(tidyverse)
library(scales)

dd <- data.frame(id = rep(c(1,2,3), 2),
                 vol = c(10,5,8,11,10,9),
                 reg = rep(c('control', 'new'), each = 3))


# original plot using stat_summary to avoid transforming data
dd %>% 
  ggplot(aes(reg, vol)) + 
  stat_summary(geom = "bar", fun = mean) +
  stat_summary(geom = "errorbar", fun.data = mean_cl_normal, fun.args = list(mult = 1))

# calculate % of control
dd %>% 
  mutate(norm_vol = vol/mean(vol[reg == "control"])) %>% 
  ggplot(aes(reg, norm_vol)) + 
  stat_summary(geom = "bar", fun = mean) +
  stat_summary(geom = "errorbar", fun.data = mean_cl_normal, fun.args = list(mult = 1)) +
  scale_y_continuous(labels = scales::percent_format())

Created on 2022-02-21 by the reprex package (v2.0.1)

Dan Adams
  • 4,971
  • 9
  • 28
  • By the way for more details on how to calculate different summary statistics in the error bars see [this](https://stackoverflow.com/questions/19258460/standard-error-bars-using-stat-summary) question. – Dan Adams Feb 21 '22 at 14:24