0

Dear Stackoverflow community,

Once again, I have a question concerning the ggplot2 possibilities of R. Before I start with explaining my problem, an example of a dataframe is provided here below:

age <- c(12, 13, 14, 15, 12, 13, 14, 15, 12, 13, 14, 15, 12, 13, 14, 15, 12, 13, 14, 15)
anticoagulation <- c(0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1)
atc <- c(1, 0, 2, 0, 1, 2, 1, 0, 2, 0, 1, 2, 1, 0, 2, 0, 1, 2, 0, 0)
df <- data.frame(age, anticoagulation, atc)
  • anticoagulation coding: 0 = no anticoagulation, 1 = received anticoagulation
  • atc coding: 0 = nitrofurantoin, 1 = fosfomycin, 2 = trimethoprim

I want to visualise the differences in anticoagulation prescription per age group and per atc group. What I have done so far:

frame <- aggregate(df$anticoagulation, by=list(df$age), FUN=length)
frame$age <- frame$Group.1
frame$n <- frame$x
frame <- frame [,3:4]

my_table<- table(df$age, df$anticoagulation)
table <- as.data.frame.matrix(my_table)
frame$n_noanti <- table$"0"
frame$n_yesanti <- table$"1"

frame$per_yesanti <- (frame$n_yesanti/frame$n)*100 # percentage
frame$per_noanti <- (frame$n_noanti/frame$n)*100 # percentage


ggplot(frame, aes(x=x) ) +
  geom_bar( aes(x = reorder (age, -per_yesanti), y =per_yesanti), stat="identity", fill="#69b3a2" ) +
  geom_label(aes(x=15, y=100, label="Used anticoagulants"), color="#69b3a2")+
  geom_bar( aes( x =reorder (age, -per_noanti), y=-per_noanti), stat="identity", fill="#404080" ) +
  geom_label( aes(x=15, y=-100, label="No anticoagulants"), color="#404080") +
  theme(axis.text.x=element_blank()) + 
  xlab ("Age") + 
  ylab ("Percentages of how many women used anticoagulants")+
  ggtitle("Distribution of anticoagulants per age")+
  theme(plot.title = element_text(hjust = 0.5), text = element_text(size=15))

Output Output of ggplot mirror density here above

However, I would like to have such an graph but with stacked bars like this: Example of stacked bars

The stacked parts are based on the atc-coding. I have tried to only make a stacked graph, but that has failed miserably.

I have tried it with the code 'aggregate', but I am stuck with what to use and what to merge together.

frame2 <- aggregate(frame$anticoagulation, by=list(frame$age, frame$atc), FUN=length)

However, this aggregation code makes it too long to use.

What I have also tried, is using a separate aggregate code for atc vs age and add that to the 'frame'.

atc2<- table(df$age, df$atc)
t_atc2 <- as.data.frame.matrix(atc2)
frame$n_nitro <- t_atc2$"0"
frame$n_fosfo <- t_atc2$"1"
frame$n_trim <- t_atc2$"2"

But still, I cannot get the stacked function to work. My attempt to do a stacked bar with only the percentage of anticoagulation=yes (coding=1) =

    ggplot(frame, aes(fill = n_nitro+n_fosfo+n_trim, y=per_yesanti, x=age)) + 
  geom_bar(position="stack", stat="identity") +
  ggtitle("Anticoagulation per age")

graph: No distinction between the 2 atc groups

I hope someone can mix the two graphs together. If that is very impossible than only a stacked graph with the percentage of the anticoagulation=1 (per_yesanti) is good as well.

So, in short, if the mixed graph is very difficult. How can I made the following graph (so only 1 graph):

  • only details with anticoagulants = 1/ yes
  • details of anticoagulants has to be in percentage (calculated by total anticoagulants yes/no)
  • x-axis is per age
  • de bars have to be filled in by atc

Like this: enter image description here

Thanks in advance!

Roontje
  • 7
  • 5
  • Why do observations with `anticoagulation == 0` have an entry for `atc coding` that is not `NA`? I would have expected that for persons that didn't receive anticoagulation prescription there is no information which anticoagulant prescription they've received. – starja Jun 04 '20 at 18:12
  • To be honest, I don't really understand what you mean. There are no NA's within the original dataset (and also the dataset I have made for stackoverflow). It is also unknown what kind of anticoagulant the patients have received. Just that they did receive or did not receive an anticoagulant and that is all I need to know. – Roontje Jun 05 '20 at 12:25

1 Answers1

0

I'm still not sure what to make of your data, but I try to give an answer. It's a bit difficult to get bar plots based on percentages grouped by another variable directly in ggplot2. Therefore, the easiest solution is to calculate the percentage beforehand and then use geom_col to plot these.

Using dplyr, you can group_by both age and the other variable you want to have the stacked separation for:

age <- c(12, 13, 14, 15, 12, 13, 14, 15, 12, 13, 14, 15, 12, 13, 14, 15, 12, 13, 14, 15)
anticoagulation <- c(0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1)
atc <- c(1, 0, 2, 0, 1, 2, 1, 0, 2, 0, 1, 2, 1, 0, 2, 0, 1, 2, 0, 0)
df <- data.frame(age, anticoagulation, atc)

library(dplyr)
library(ggplot2)

df_summary <- df %>% 
  group_by(age, anticoagulation) %>% 
  summarise(count = n()) %>% 
  mutate(percentage = count / sum(count) * 100)


ggplot(df_summary, aes(x = factor(age), y = percentage, fill = factor(anticoagulation))) +
         geom_col()

enter image description here

df_summary_2 <- df %>% 
  group_by(age, atc) %>% 
  summarise(count = n()) %>% 
  mutate(percentage = count / sum(count) * 100)

ggplot(df_summary_2, aes(x = factor(age), y = percentage, fill = factor(atc))) +
  geom_col()

enter image description here


Edit

I've adapted my graph. I've couldn't come up with a solution to calculate everything in one go. Therefore I first calculate the counts per age group in total_count_info. This allows me to later calculate the percentage for every age group. Then I count the occurrences of atc per age and anticoagulation:

total_count_info <- df %>% 
  group_by(age) %>% 
  summarise(count_age = n())

df_summary_3 <- df %>% 
  group_by(age, anticoagulation, atc) %>% 
  summarise(count = n()) %>% 
  left_join(total_count_info) %>% 
  mutate(percentage = count / count_age * 100)


ggplot(df_summary_3 %>% filter(anticoagulation == 1),
aes(x = factor(age), y = percentage, fill = factor(atc))) +
  geom_col() +
  ylab("percentage of anticoagulation == 1")

enter image description here

starja
  • 9,887
  • 1
  • 13
  • 28
  • Thank you for responding to my question! I see that you made 2 different graphs: 1 with age vs anticoagulation and 1 with age and atc. Is that right? If so, is there a possibility to combine them? To have age on the x-axis, percentage of the amount of anticoagulation (=yes) that is divided/ filled by atc? – Roontje Jun 05 '20 at 12:21