How do I get the percentage of 4 columns of yes/no data in R?

Question

Let us say, I have a data set with the following columns: Columns with "Yes" and "No" Responses

Dataset: (https://docs.google.com/spreadsheets/d/1TLw-UG8WOlFQ3dCn4Kmdok6I2rJL3M31oYb4_a_AxjU/edit?usp=sharing)

I would like to have the final output of the data to be as show below enter image description here

open_data_portals <- ogd_research_project_csv_col_names_clean %>%

  dplyr::select(Opd_InstitutionWebsite,
                Opd_Portal,
                Opd_MobileApps,
                Opd_CustomerServicePortal) %>%

  dplyr::group_by(Opd_InstitutionWebsite,
                  Opd_Portal,
                  Opd_MobileApps,
                  Opd_CustomerServicePortal) %>% 
  
dplyr::summarise()

I have tried the above but am stuck at summarise. How can I proceed, or how can I solve this challenge?

please use dput(yourdata) or dput(head(yourdata)) to share your data.frame instead of a picture — Mike, Aug 16 '23 at 16:14

r2evans · Answer 1 · 2023-08-16T20:19:38.707

I'll demo a technique using mtcars.

library(dplyr)
mtcars %>%
  select(gear, am, vs) %>%
  # this mutate is compensating for not having your data
  mutate(across(everything(), ~ if_else(. > 0, "Yes", "No"))) %>%
  pivot_longer(everything()) %>%
  group_by(name, value) %>%
  count() %>%
  group_by(name) %>%
  mutate(n = 100 * n / sum(n)) %>%
  ungroup() %>%
  pivot_wider(id_cols = c(name), names_from = "value", values_from = "n") %>%
  mutate(across(-name, ~ coalesce(., 0)))
# # A tibble: 3 × 3
#   name     No   Yes
#   <chr> <dbl> <dbl>
# 1 am     59.4  40.6
# 2 gear    0   100  
# 3 vs     56.2  43.8

I'll adjust, after retrieving your data.

library(dplyr)
library(tidyr)
ogd_research_project_csv_col_names_clean %>%
  select(Opd_InstitutionWebsite,
                Opd_Portal,
                Opd_MobileApps,
                Opd_CustomerServicePortal) %>%
  pivot_longer(everything()) %>%
  group_by(name, value) %>%
  count() %>%
  group_by(name) %>%
  mutate(n = 100 * n / sum(n)) %>%
  ungroup() %>%
  pivot_wider(id_cols = c(name), names_from = "value", values_from = "n") %>%
  # safeguard against an all-Yes or all-No category, o/w not needed
  mutate(across(-name, ~ coalesce(., 0)))
# # A tibble: 4 × 3
#   name                         No   Yes
#   <chr>                     <dbl> <dbl>
# 1 Opd_CustomerServicePortal  44.4  55.6
# 2 Opd_InstitutionWebsite     27.2  72.8
# 3 Opd_MobileApps             23.9  76.1
# 4 Opd_Portal                 33.9  66.1

Reproducible data with results:

quux <- ogd_research_project_csv_col_names_clean %>%
  select(Opd_InstitutionWebsite,
                Opd_Portal,
                Opd_MobileApps,
                Opd_CustomerServicePortal) %>%
  head(n=20)
# cat(deparse1(quux), "\n")
quux <- structure(list(Opd_InstitutionWebsite = c("Yes", "Yes", "No", "Yes", "No", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "No", "Yes", "Yes", "Yes", "Yes", "No", "Yes", "Yes", "Yes"), Opd_Portal = c("No", "Yes", "Yes", "No", "No", "Yes", "No", "Yes", "No", "Yes", "Yes", "No", "Yes", "Yes", "Yes", "Yes", "Yes", "No", "No", "Yes"), Opd_MobileApps = c("No", "Yes", "No", "Yes", "No", "Yes", "Yes", "Yes", "No", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "No", "No", "Yes"), Opd_CustomerServicePortal = c("No",  "Yes", "Yes", "Yes", "No", "Yes", "No", "Yes", "No", "Yes", "Yes", "Yes", "Yes", "Yes", "No", "Yes", "No", "No", "Yes", "No")), row.names = c(NA, 20L), class = "data.frame")

quux %>%
  select(Opd_InstitutionWebsite,
                Opd_Portal,
                Opd_MobileApps,
                Opd_CustomerServicePortal) %>%
  pivot_longer(everything()) %>%
  group_by(name, value) %>%
  count() %>%
  group_by(name) %>%
  mutate(n = 100 * n / sum(n)) %>%
  ungroup() %>%
  pivot_wider(id_cols = c(name), names_from = "value", values_from = "n") %>%
  mutate(across(-name, ~ coalesce(., 0)))
# # A tibble: 4 × 3
#   name                         No   Yes
#   <chr>                     <dbl> <dbl>
# 1 Opd_CustomerServicePortal    40    60
# 2 Opd_InstitutionWebsite       20    80
# 3 Opd_MobileApps               30    70
# 4 Opd_Portal                   40    60

I have tried following the example but I get a challenge on the line code count (), as it returns as "Yes" only without the "No". Here is the link to my dataset (https://docs.google.com/spreadsheets/d/1TLw-UG8WOlFQ3dCn4Kmdok6I2rJL3M31oYb4_a_AxjU/edit?usp=sharing). I will sincerely appreciate your support. — John Keya, Aug 16 '23 at 19:25
See my edit, having had to compensate for having no data previously. Other than the compensatory code of `mtcars %>% select(..) %>% mutate(..)`, all else is unchanged. I hope this makes it clearer. — r2evans, Aug 16 '23 at 20:15

Mark · Accepted Answer · 2023-08-16T16:57:31.153

df |>
  pivot_longer(everything())|>
  mutate(t = n(), .by = name) |>
  summarise(n = (n() / first(t)*100) %>% round() %>% paste0("%"), .by = c(name, value))|>  
  pivot_wider(names_from = value, values_from = n)

Output:

# A tibble: 4 × 3
  name  No    Yes  
  <chr> <chr> <chr>
1 a     52%   48%  
2 b     47%   53%  
3 c     53%   47%  
4 d     41%   59%

Creating the sample df:

set.seed(0)
library(tidyverse)

df <- tibble(
    a  = sample(c("Yes", "No"), 123, replace = TRUE),
    b = sample(c("Yes", "No"), 123, replace = TRUE),
    c = sample(c("Yes", "No"), 123, replace = TRUE),
    d = sample(c("Yes", "No"), 123, replace = TRUE))

How do I get the percentage of 4 columns of yes/no data in R?

2 Answers2