1

I am trying to create a picture that summarises my data. Data is about prevalence of drug use obtained from different practices form different countries. Each practice has contributed with a different amount of data and I want to show all of this in my picture.

Here is a subset of the data to work on:

gr<-data.frame(matrix(0,36))
gr$drug<-c("a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","b","b","b","b","b","b","b","b","b","b","b","b","b","b","b","b","b","b")
gr$practice<-c("a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r")
gr$country<-c("c1","c1","c1","c1","c1","c1","c1","c1","c1","c1","c2","c2","c2","c2","c2","c2","c3","c3","c1","c1","c1","c1","c1","c1","c1","c1","c1","c1","c2","c2","c2","c2","c2","c2","c3","c3")
gr$prevalence<-c(9.14,5.53,16.74,1.93,8.51,14.96,18.90,11.18,15.00,20.10,24.56,22.29,19.41,20.25,25.01,25.87,29.33,20.76,18.94,24.60,26.51,13.37,23.84,21.82,23.69,20.56,30.53,16.66,28.71,23.83,21.16,24.66,26.42,27.38,32.46,25.34)
gr$prop<-c(0.027,0.023,0.002,0.500,0.011,0.185,0.097,0.067,0.066,0.023,0.433,0.117,0.053,0.199,0.098,0.100,0.594,0.406,0.027,0.023,0.002,0.500,0.011,0.185,0.097,0.067,0.066,0.023,0.433,0.117,0.053,0.199,0.098,0.100,0.594,0.406)
gr$low.CI<-c(8.27,4.80,12.35,1.83,7.22,14.53,18.25,10.56,14.28,18.76,24.25,21.72,18.62,19.83,24.36,25.22,28.80,20.20,17.73,23.15,21.06,13.12,21.79,21.32,22.99,19.76,29.60,15.41,28.39,23.25,20.34,24.20,25.76,26.72,31.92,24.73)
gr$high.CI<-c(10.10,6.37,22.31,2.04,10.00,15.40,19.56,11.83,15.74,21.52,24.87,22.86,20.23,20.68,25.67,26.53,29.86,21.34,20.21,26.10,32.79,13.63,26.02,22.33,24.41,21.39,31.48,17.98,29.04,24.43,22.01,25.12,27.09,28.05,33.01,25.95)

The code I wrote is this

p<-ggplot(data=gr, aes(x=factor(drug), y=as.numeric(gr$prevalence), ymax=max(high.CI),position="dodge",fill=practice,width=prop))
colour<-c(rep("gray79",10),rep("gray60",6),rep("gray39",2))
p + theme_bw()+
  geom_bar(stat="identity",position = position_dodge(0.9)) +
  labs(x="Drug",y="Prevalence") + 
  geom_errorbar(ymax=gr$high.CI,ymin=gr$low.CI,position=position_dodge(0.9),width=0.25,size=0.25,colour="black",aes(x=factor(drug), y=as.numeric(gr$prevalence), fill=practice)) +
  ggtitle("Drug usage by country and practice") +
  scale_fill_manual(values = colour)+ guides(fill=F)

The figure I obtain is this one where bars are all on top of each other while I want them "dodge".

enter image description here

I also obtain the following warning:

ymax not defined: adjusting position using y instead Warning message: position_dodge requires non-overlapping x intervals

Ideally I would get each bar near one another, with their error bars in the middle of its bar, all organised by country.

Also should I be concerned about the warning (which I clearly do not fully understand)?

I hope this makes sense. I hope I am close enough, but I don't seem to be going anywhere, some help would be greatly appreciated.

Thank you

MarcoD
  • 137
  • 1
  • 11

2 Answers2

2

ggplot's geom_bar() accepts the width parameter, but doesn't line them up neatly against one another in dodged position by default. The following workaround references the solution here:

library(dplyr)

# calculate x-axis position for bars of varying width
gr <- gr %>%
  group_by(drug) %>%
  arrange(practice) %>%
  mutate(pos = 0.5 * (cumsum(prop) + cumsum(c(0, prop[-length(prop)])))) %>%
  ungroup()

x.labels <- gr$practice[gr$drug == "a"]
x.pos <- gr$pos[gr$drug == "a"]

ggplot(gr,
       aes(x = pos, y = prevalence, 
           fill = country, width = prop,
           ymin = low.CI, ymax = high.CI)) +
  geom_col(col = "black") +
  geom_errorbar(size = 0.25, colour = "black") +
  facet_wrap(~drug) +
  scale_fill_manual(values = c("c1" = "gray79",
                               "c2" = "gray60",
                               "c3" = "gray39"),
                    guide = F) +
  scale_x_continuous(name = "Drug",
                     labels = x.labels,
                     breaks = x.pos) +
  labs(title = "Drug usage by country and practice", y = "Prevalence") +
  theme_classic()

plot

Z.Lin
  • 28,055
  • 6
  • 54
  • 94
  • This is exactly what I was after! Brilliant! Had not used dplyr before so have some studying to do there! Thank you so much @Z.Lin ! – MarcoD Sep 27 '17 at 20:27
0

There is a lot of information you are trying to convey here - to contrast drug A and drug B across countries using the barplots and accounting for proportions, you might use the facet_grid function. Try this:

      colour<-c(rep("gray79",10),rep("gray60",6),rep("gray39",2))




      gr$drug <- paste("Drug", gr$drug)
      p<-ggplot(data=gr, aes(x=factor(practice), y=as.numeric(prevalence), 
                             ymax=high.CI,ymin = low.CI, 
                             position="dodge",fill=practice, width=prop))


        p + theme_bw()+ facet_grid(drug~country, scales="free")  +
        geom_bar(stat="identity") +
        labs(x="Practice",y="Prevalence") + 
        geom_errorbar(position=position_dodge(0.9), width=0.25,size=0.25,colour="black") +
        ggtitle("Drug usage by country and practice") +
        scale_fill_manual(values = colour)+ guides(fill=F)

enter image description here

The width is too small in the C1 country and as you indicated the one clinic is quite influential.

Also, you can specify your aesthetics with the ggplot(aes(...)) and not have to reset it and it is not needed to include the dataframe objects name in the aes function within the ggplot call.

  • Thank you, that is great thanks! Now that they are not all on top of each other I can see that it had not done what I wanted to do - my goal is to account for data size contributed from each practice. For example the 4th column in the C1 country accounts for 50% of the data from C1 so should be much larger, column 3 only 0.2%, and so on - So column gr$prop is the proportion of data from each practice compared to the data from that country. So how do I get the column width to reflect the value in gr$prop? Apologies if not is a completely different question now. Your help is much appreciated! – MarcoD Sep 27 '17 at 14:33
  • also I see that this solution did not include width=prop in p<-ggplot() - so may that be why? I have tried to add it but it give the error message "Error: Incompatible lengths for set aesthetics: ymax, colour, size, width, ymin" – MarcoD Sep 27 '17 at 14:39
  • Sorry about that Marco - my bad. I think we can proceed with facet_grid then to get the effect you want. What I think is happening is the width argument is too large relative to the x-scale and that is what is causing the data to stack on top of eachother. I will revise my answer HTH – James Thomas Durant Sep 27 '17 at 15:29