0

Morning, everybody,

I have a question about the formatting of my graph.

Orignal graph

Here I'm representing the average group size as a function of distance from the coast. Would it be possible to divide each column by the percentage of the number of observations per class of hours while keeping the size of the initial column representing the average.

There is my data :

 dput(droplevels(df.long2[1:15, ]))
structure(list(Distance = c("1-40", "1-40", "1-40", "40-80", 
"40-80", "40-80", "80-120", "80-120", "80-120", "120-160", "120-160", 
"120-160", "160-225", "160-225", "160-225"), mean = c(6.66901408450704, 
6.66901408450704, 6.66901408450704, 6.33333333333333, 6.33333333333333, 
6.33333333333333, 10.2561403508772, 10.2561403508772, 10.2561403508772, 
11.3986013986014, 11.3986013986014, 11.3986013986014, 23.7051282051282, 
23.7051282051282, 23.7051282051282), erreur_std = c(0.63121621161232, 
0.63121621161232, 0.63121621161232, 0.469878994871701, 0.469878994871701, 
0.469878994871701, 1.29468464273019, 1.29468464273019, 1.29468464273019, 
1.53421016593719, 1.53421016593719, 1.53421016593719, 4.00121147880924, 
4.00121147880924, 4.00121147880924), count = c(142L, 142L, 142L, 
312L, 312L, 312L, 285L, 285L, 285L, 143L, 143L, 143L, 78L, 78L, 
78L), Heure = c("0-4", "4-8", "8-12", "0-4", "4-8", "8-12", "0-4", 
"4-8", "8-12", "0-4", "4-8", "8-12", "0-4", "4-8", "8-12"), n = c(48L, 
79L, 15L, 131L, 148L, 33L, 85L, 152L, 48L, 83L, 51L, 9L, 56L, 
11L, 11L)), row.names = c(NA, -15L), class = c("tbl_df", "tbl", 
"data.frame"))

But unfortunately when I try to make this graph, I get this because the lines accumulate

graph with error

There is the script I use :

ggplot(df.long2, aes(x=Distance, y = mean, fill = Heure)) +
  geom_col(position = "stack", fill='steelblue', color="gray", stat="identity")+
  geom_errorbar(data = df.long2, aes(ymin = mean-erreur_std, ymax = mean+erreur_std), width = .2, position = position_dodge(width = 0.9))+
  theme_bw() +
  scale_x_discrete(limits=c("1-40", "40-80", "80-120", "120-160", "160-225")) +
  labs(title = "Moyenne de la taille des groupes chez le dauphin commun \n(Delphinus delphis) en fonction de la distance à la côte ", 
       caption = "Source : Observatoire PELAGIS ",
       x = "Distance à la côte (kilomètres)",
       y = "Moyenne de la taille des groupes",
       subtitle = "n=960") +
  theme(plot.title = element_text(hjust = 0.5)) +
  geom_text(aes(label=count), y=-0.5, hjust = 0.1, stat='count', colour="black", size=3) +
  geom_text(aes(label= "n=" ), y= -0.5, hjust = 1.1, colour="black", size = 3)

Thank you in advance for your response

1 Answers1

0

To represent a column plot, I would recommend to calculate beforehand the values you want to present. Try to get the table of the exact values that will be represented in your graph. Do not let ggplot do some calculations for you. In your case, this would be something like:

library(ggplot2)
library(dplyr)

df.long3 <- df.long2 %>% 
  group_by(Distance) %>% 
  summarise(
    mean = mean(mean),
    erreur_std = mean(erreur_std)
  )

ggplot(df.long3, aes(x=Distance, y = mean)) +
  geom_col(position = "stack", fill='steelblue', color="gray")+
  geom_errorbar(data = df.long2, aes(ymin = mean-erreur_std, ymax = mean+erreur_std), width = .2, position = position_dodge(width = 0.9))+
  theme_bw()

barbarplot

However, I have two concerns about this.

  1. In your dataset, you have mean and erreur_std repeated with exactly same values for each Hour. I suspect a miscalculation of this dataset. I assume that you let a group "Hour" in your previous summary calculation.

  2. What we call a "barbarplot" is a bad representation of your data. Such an error bar is non sense if you do not know the distribution of the dataset. I would recommend a violinplot, which does not assume the symmetry of your distribution. Such a "barbarplot" let suppose that you want to hide the reality of your raw data.

    I cannot propose you the code for the violinplot because you did not gave the raw data, but for more information, you can explore these two articles:

Sébastien Rochette
  • 6,536
  • 2
  • 22
  • 43