How to calculate the percentage between two variables for specific observations in R?

Question

I'm trying to calculate the incidence/percentage of a binary variable in relation to a variable that contains 5 (+ one NA) different income brackets. I'm using:

afghan %>% group_by(income)  %>% 
  summarize(violent.exp.ISAF = n()) %>%
  mutate(Percentage = violent.exp.ISAF/sum(violent.exp.ISAF)*100)

But this is giving me the general percentage of the binary variables in relation to the whole table and not just within that specific income bracket, like this:

# income          violent.exp.taliban Percentage
#  <chr>                         <int>      <dbl>
#1 10,001-20,000                   616     22.4  
#2 2,001-10,000                   1420     51.6  
#3 20,001-30,000                    93      3.38 
#4 less than 2,000                 457     16.6  
#5 over 30,000                      14      0.508
#6 NA                              154      5.59

And I wanted to have the percentage of the binary variable just within that specific income bracket. Any advice?

A sample of the afghan dataset:

> dput(head(afghan))
structure(list(province = c("Logar", "Logar", "Logar", "Logar", 
"Logar", "Logar"), district = c("Baraki Barak", "Baraki Barak", 
"Baraki Barak", "Baraki Barak", "Baraki Barak", "Baraki Barak"
), village.id = c(80, 80, 80, 80, 80, 80), age = c(26, 49, 60, 
34, 21, 18), educ.years = c(10, 3, 0, 14, 12, 10), employed = c(0, 
1, 1, 1, 1, 1), income = c("2,001-10,000", "2,001-10,000", "2,001-10,000", 
"2,001-10,000", "2,001-10,000", NA), violent.exp.ISAF = c(0, 
0, 1, 0, 0, 0), violent.exp.taliban = c(0, 0, 0, 0, 0, 0), list.group = c("control", 
"control", "control", "ISAF", "ISAF", "ISAF"), list.response = c(0, 
1, 1, 3, 3, 2)), row.names = c(NA, -6L), class = c("tbl_df", 
"tbl", "data.frame"))

Can you provide a sample of your `afghan` dataset? You can use `dput(afghan)` or `dput(head(afghan))` and provide the output. — Matt, Jul 24 '20 at 15:49
``` structure(list(province = c("Logar", "Logar", "Logar", "Logar", "Logar", "Logar"), village.id = c(80, 80, 80, 80, 80, 80), age = c(26, 49, 60, 34, 21, 18), employed = c(0, 1, 1, 1, 1, 1), income = c("2,001-10,000", "2,001-10,000", "2,001-10,000", "2,001-10,000", "2,001-10,000", NA), violent.exp.ISAF = c(0, 0, 1, 0, 0, 0), violent.exp.taliban = c(0, 0, 0, 0, 0, 0), list.group = c("control", "control", "control", "ISAF", "ISAF", "ISAF"), list.response = c(0, 1, 1, 3, 3, 2)), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame")) ``` — D C, Jul 24 '20 at 15:54

Matt · Accepted Answer · 2020-07-24T16:06:18.130

0

Using dplyr/tidyverse and janitor, you can do:

library(tidyverse)
library(janitor)

afghan %>% 
  group_by(income) %>% 
  tabyl(income, violent.exp.ISAF) %>% 
  adorn_percentages() %>% 
  adorn_pct_formatting()

Which shows your percentage distribution across income:

       income      0     1
 2,001-10,000  80.0% 20.0%
         <NA> 100.0%  0.0%

To create a tibble:

afghan_tibble <- afghan %>% 
  group_by(income) %>% 
  tabyl(income, violent.exp.ISAF) %>% 
  adorn_percentages() %>% 
  adorn_pct_formatting() %>% 
  as_tibble()

edited Jul 24 '20 at 16:06

answered Jul 24 '20 at 15:56

Matt

7,255
2
12
34

It worked! But could I generate a tibble as a result? – D C Jul 24 '20 at 16:04
Awesome! Yes, you can easily create a `tibble`. I updated the post with a way to do that. Please consider marking the question as answered by clicking the green check if this has resolved your issue. – Matt Jul 24 '20 at 16:07
Hey Matt, the problem now is that I'm trying to create a graph but the percentages became characters. Any way to convert them to numeric? – D C Jul 24 '20 at 17:01
@DC if you remove `adorn_pct_formatting() %>% ` that should fix it. You may need to add `mutate(percent_column = percent_column * 100)` – Matt Jul 24 '20 at 17:03

How to calculate the percentage between two variables for specific observations in R?

1 Answers1