3

This is the code used to derive the first table in my question.

JH %>% group_by(ATT_ID, CAR=="B") %>%
summarize(count = n(), .groups = "drop")
ATT_ID CAR == "B" Count
ONE FALSE 1
TWO TRUE 1
THREE TRUE 3
THREE FALSE 5
FOUR FALSE 2
FIVE TRUE 4
SIX TRUE 8
SIX FALSE 4

How can I get the table above to look like:

ATT_ID Percentage of "B"
ONE 0%
TWO 100%
THREE 37.5%
FOUR 0%
FIVE 100%
SIX 67%
  • Notice how some ID's are seen twice so as to show the presence of both FALSE & TRUE whereas other ID's appear once to showcase the presence of only one or the other.

Thank you

Antonio
  • 417
  • 2
  • 8
  • If you are looking for SQL recommendations, please be explicit about it, one cannot rely on the tag alone to communicate your intentions (too often it is an incorrect tag). If not SQL, please remove the [tag:sql] tag. – r2evans Dec 12 '22 at 20:40

1 Answers1

4

Grouped by 'ATT_ID', get the sum of Count where CAR=="B" is TRUE and divide by the sum of full Count

library(dplyr)
df1 %>%
   group_by(ATT_ID = factor(ATT_ID, levels = unique(ATT_ID))) %>%
  summarise(Percentage_of_B = paste0(round(
      sum(Count[`CAR == "B"`])/sum(Count) * 100, 1), "%"))

-output

# A tibble: 6 × 2
  ATT_ID Percentage_of_B
  <fct>  <chr>          
1 ONE    0%             
2 TWO    100%           
3 THREE  37.5%          
4 FOUR   0%             
5 FIVE   100%           
6 SIX    66.7%     

data

df1 <- structure(list(ATT_ID = c("ONE", "TWO", "THREE", "THREE", "FOUR", 
"FIVE", "SIX", "SIX"), `CAR == "B"` = c(FALSE, TRUE, TRUE, FALSE, 
FALSE, TRUE, TRUE, FALSE), Count = c(1L, 1L, 3L, 5L, 2L, 4L, 
8L, 4L)), class = "data.frame", row.names = c(NA, -8L))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Could I ask the reasoning behind applying levels()? – Antonio Dec 12 '22 at 21:04
  • 1
    @Antonio it is just that with group by, by default, it does reorder the values of 'ATT_ID' in alphabetic order. So, I changed it to `factor` with `levels` specified as the order of first occurrence with `unique` – akrun Dec 12 '22 at 21:18