-2

I am new to R and programming in general, I have a data frame similar to this but with a lot more rows:

yes_no <- c('Yes','No','No','Yes','Yes','No','No','No','Yes','Yes','No','Yes','No','Yes','No','Yes','No','Yes','No','Yes')
age <- c('1','1','2','3','4','5','1','2','2','3','1','5','5','5','1','4','4','2','5','3')

data<- data.frame(yes_no,age)

I am trying to create a line graph using ggplot where the x-axis is the age and the y axis is the percentage of yes for a specific age.

I am not too sure how to create the percentage

any advice? thank you!

Kishan
  • 3
  • 1

2 Answers2

1

Another solution stacked bar chart:

Sample data:

    yes_no<-c('Yes','No','No','Yes','Yes','No','No','No','Yes','Yes','No','Yes','No','Yes','No','Yes','No','Yes','No','Yes')
        age <- c('1','1','2','3','4','5','1','2','2','3','1','5','5','5','1','4','4','2','5','3')
        
    data<- data.frame(yes_no,age)

Draw the plot:

    ggplot(data, aes(x = factor(age), fill = factor(yes_no))) +
      geom_bar(position="fill", width = 0.7)+
 geom_text(
    aes(label=signif(..count.. / tapply(..count.., ..x.., sum)[as.character(..x..)], digits=3)),
    stat="count",
    position=position_fill(vjust=0.5)) +
      labs(x="Age", y="Percentage", title="", fill="")+
      theme_bw() +
      theme(plot.title = element_text(hjust = 0.5,  face="bold", size=20, color="black")) + 
      theme(axis.title.x = element_text(family="Times", face="bold", size=16, color="black"))+
      theme(axis.title.y = element_text(family="Times", face="bold", size=16, color="black"))+
      theme(axis.text.x = element_text( hjust = 1,  face="bold", size=14, color="black") )+
      theme(axis.text.y = element_text( hjust = 1,  face="bold", size=14, color="black") )+
      theme(plot.title = element_text(hjust = 0.5))+
      theme(legend.title = element_text(family="Times", color = "black", size = 16,face="bold"),
            legend.text = element_text(family="Times", color = "black", size = 14,face="bold"),
            legend.position="bottom",
            plot.title = element_text(hjust = 0.5))

Outcome: enter image description here

Rfanatic
  • 2,224
  • 1
  • 5
  • 21
0
data %>%
  group_by(age, yes_no) %>%
  mutate(k = n()) %>%
  ungroup(yes_no) %>%
  mutate(n = n(),
         p = 100*k/n) %>%
  unique() %>%
  ungroup() %>%
  complete(age, yes_no,
           fill = list(k = 0, n = 0, p = 0))  %>%
  ggplot(aes(x=age, y =p, group = yes_no, color = yes_no)) +
  geom_line() +
  ylim(c(0,100))

enter image description here

Tur
  • 604
  • 4
  • 9
  • hi thank you! what is the purpose of adding the mutate(k = n()) %>% ? – Kishan Mar 01 '22 at 13:14
  • It creates a new variable that counts the number of cases of the grouping variable. In the first mutate, k = n() counts the yes's and no's of each value of age (data grouped by age and yes_no), the second mutate n = n() counts the total of individuals with the same age(ungrouped yes_no, only grouped by age). The %>% is a pipe, to concatenate functions ( mean(data) equals data %>% mean()) – Tur Mar 01 '22 at 14:10