There might be a solution using group_by
and summarise
with across
, but times when it's hard to find those, you can turn to the more flexible option of group_split
and purrr::map
.
data %>%
group_split(Country) %>% #for every country's dataset
map(function(x, country = x$Country[1]){
imap_dfr(select(x, -Country), function(values, name){ #for every variable other than 'Country' itself
tab <- table(values) %>% c() #tabelate its values
tibble(var = name, value = names(tab), #create a tibble with the name of the variable, its possible values,
'{country} freq' := tab, #their frequencies,
'{country} prop' := tab/sum(tab)) #and their proportions
})
}) %>%
purrr::reduce(full_join) #then join all of them
Obs: country = x$Country[1]
is just a shortcut to save the current group's country. Later, we use that variable to create the columns for 'freq' and 'prop' with the country name first, using the 'glue' package syntax: '{country} freq' :=
.
Just to make it clearer to see what's happening, here's the tibble created in one of the iterations (for 'Country 1' and 'Gender'):
# A tibble: 2 × 4
var value `Country1 freq` `Country1 prop`
<chr> <chr> <int> <dbl>
1 Gender F 5 0.833
2 Gender M 1 0.167
The map's are creating one of these for every country and every variable, and later we are joining them all with the reduce(full_join)
Result:
# A tibble: 5 × 8
var value `Country1 freq` `Country1 prop` `Country2 freq` `Country2 prop` `Country3 freq` `Country3 prop`
<chr> <chr> <int> <dbl> <int> <dbl> <int> <dbl>
1 Gender F 6 0.5 5 0.833 2 0.167
2 Gender M 6 0.5 1 0.167 10 0.833
3 Income inc1 5 0.417 2 0.333 2 0.167
4 Income inc2 1 0.0833 4 0.667 5 0.417
5 Income inc3 6 0.5 NA NA 5 0.417
Dummy data:
library(tidyverse)
data <- tibble(Country = sample(paste0('Country', 1:3), 30, TRUE),
Gender = sample(c('F', 'M'), 30, TRUE),
Income = sample(paste0('inc', 1:3), 30, TRUE))