0

I want to create a table in R that looks something like this:

enter image description here

I have tried many different approaches using dplyr and basic R but nothing looks like I want it to.

Here is an example of my dataframe:

Country Gender Income
Country 1 F Inc1
Country 1 F Inc2
Country 2 F Inc2
Country 1 M Inc1
Country 2 M Inc3
Country 3 F Inc3
Country 3 M Inc2
benson23
  • 16,369
  • 9
  • 19
  • 38
anso_s
  • 1
  • 1

1 Answers1

0

There might be a solution using group_by and summarise with across, but times when it's hard to find those, you can turn to the more flexible option of group_split and purrr::map.

data %>% 
  group_split(Country) %>% #for every country's dataset
  map(function(x, country = x$Country[1]){
    imap_dfr(select(x, -Country), function(values, name){ #for every variable other than 'Country' itself
      tab <- table(values) %>% c() #tabelate its values
      tibble(var = name, value = names(tab), #create a tibble with the name of the variable, its possible values,
             '{country} freq' := tab, #their frequencies,
             '{country} prop' := tab/sum(tab)) #and their proportions
      })
  }) %>%
  purrr::reduce(full_join) #then join all of them

Obs: country = x$Country[1] is just a shortcut to save the current group's country. Later, we use that variable to create the columns for 'freq' and 'prop' with the country name first, using the 'glue' package syntax: '{country} freq' :=.

Just to make it clearer to see what's happening, here's the tibble created in one of the iterations (for 'Country 1' and 'Gender'):

# A tibble: 2 × 4
  var    value `Country1 freq` `Country1 prop`
  <chr>  <chr>           <int>           <dbl>
1 Gender F                   5           0.833
2 Gender M                   1           0.167

The map's are creating one of these for every country and every variable, and later we are joining them all with the reduce(full_join)

Result:

# A tibble: 5 × 8
  var    value `Country1 freq` `Country1 prop` `Country2 freq` `Country2 prop` `Country3 freq` `Country3 prop`
  <chr>  <chr>           <int>           <dbl>           <int>           <dbl>           <int>           <dbl>
1 Gender F                   6          0.5                  5           0.833               2           0.167
2 Gender M                   6          0.5                  1           0.167              10           0.833
3 Income inc1                5          0.417                2           0.333               2           0.167
4 Income inc2                1          0.0833               4           0.667               5           0.417
5 Income inc3                6          0.5                 NA          NA                   5           0.417

Dummy data:

library(tidyverse)
data <- tibble(Country = sample(paste0('Country', 1:3), 30, TRUE),
               Gender = sample(c('F', 'M'), 30, TRUE),
               Income = sample(paste0('inc', 1:3), 30, TRUE))