I am working with the R programming language.
I have the following dataset:
library(dplyr)
df = structure(list(ethnicity = c("c", "c", "c", "b", "c", "b", "b",
"b", "c", "a", "b", "b", "a", "b", "c", "a", "c", "c", "a", "a",
"a", "a", "c", "b", "c", "b", "a", "b", "c", "b", "a", "c", "c",
"a", "c", "b", "a", "c", "a", "a", "b", "c", "c", "a", "c", "a",
"c", "b", "a", "b", "a", "a", "c", "a", "b", "a", "a", "c", "a",
"b", "a", "c", "a", "c", "b", "c", "b", "b", "c", "b", "b", "c",
"c", "a", "b", "b", "a", "b", "a", "a", "b", "c", "c", "a", "b",
"a", "b", "a", "c", "c", "b", "c", "a", "b", "b", "c", "b", "a",
"c", "c"), number_of_degrees = c(3L, 2L, 2L, 3L, 1L, 1L, 3L,
2L, 2L, 2L, 2L, 2L, 2L, 3L, 2L, 1L, 2L, 2L, 2L, 3L, 2L, 3L, 2L,
3L, 1L, 3L, 3L, 3L, 1L, 3L, 3L, 2L, 2L, 2L, 3L, 3L, 3L, 2L, 1L,
2L, 1L, 3L, 3L, 2L, 1L, 3L, 1L, 3L, 2L, 2L, 1L, 3L, 2L, 1L, 3L,
3L, 3L, 1L, 2L, 2L, 1L, 2L, 3L, 3L, 1L, 2L, 1L, 2L, 3L, 3L, 1L,
3L, 2L, 1L, 1L, 2L, 3L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 3L,
1L, 1L, 2L, 1L, 2L, 1L, 1L, 1L, 3L, 3L, 2L, 1L, 2L)), class = "data.frame", row.names = c(NA,
-100L))
df %>%
# Group the data by number_of_degrees
group_by(number_of_degrees) %>%
# Calculate the percentage of each ethnicity within each group
summarize(
percent_a = mean(ethnicity == "a") * 100,
percent_b = mean(ethnicity == "b") * 100,
percent_c = mean(ethnicity == "c") * 100
)
This produces the following output:
# A tibble: 3 x 4
number_of_degrees percent_a percent_b percent_c
<int> <dbl> <dbl> <dbl>
1 1 33.3 36.7 30
2 2 31.6 21.1 47.4
3 3 34.4 40.6 25
My Question: Is there a more "compact" way to write this code such that I don't have to manually write "percent_a","percent_b", etc.? This way, it would be much faster and automatically do it for all values of ethnicity.