-1

I want to ask how can I merge Tibbles.

I am trying to do a table for sociodemographics.

Sample data frame:

education <- c("high school", "elementary school", NA, "university", "high school")
relationship <- c("single", "married", "divorced", NA, "single")
occupation <- c("student", "teacher", "unemployed", NA, "doctor")
df <- data.frame(education, relationship, occupation)
df
education relationship occupation
1       high school       single    student
2 elementary school      married    teacher
3              <NA>     divorced unemployed
4        university         <NA>       <NA>
5       high school       single     doctor

I want to calculate the sample size and percentages for each variable and merge them on the "n" and percentages column.

I did "n" and percentage calculations like this:

library(dplyr)
library(formattable)

a <- df %>%filter(!is.na(education))%>%
  group_by(education) %>%
  summarise(n = n()) %>%
  mutate(`%` = formattable::percent(n / sum(n)))
a
A tibble: 3 x 3
education             n `%`       
<chr>             <int> <formttbl>
1 elementary school   1 25.00%    
2 high school         2 50.00%    
3 university          1 25.00% 

b <- df %>%filter(!is.na(relationship))%>%
  group_by(relationship) %>%
  summarise(n = n()) %>%
  mutate(`%` = formattable::percent(n / sum(n)))
b
A tibble: 3 x 3
relationship     n `%`       
<chr>        <int> <formttbl>
1 divorced         1 25.00%    
2 married          1 25.00%    
3 single           2 50.00% 
    
c <- df %>%filter(!is.na(occupation))%>%
 group_by(occupation) %>%
  summarise(n = n()) %>%
  mutate(`%` = formattable::percent(n / sum(n)))

c
A tibble: 4 x 3
occupation     n `%`       
<chr>      <int> <formttbl>
1 doctor         1 25.00%    
2 student        1 25.00%    
3 teacher        1 25.00%    
4 unemployed     1 25.00%  

My question is how can I merge them as one below another by the "n" and "percentage" columns and have one big sociodemographics table?

dplyr
  • 83
  • 5

1 Answers1

0

I recommend not manually calculating "the "n" and percentages column" the way you do. There is a lot of repetition in your code, and this may lead to (human) errors.

Instead you can do the following

df %>%
    pivot_longer(everything()) %>%
    na.omit() %>%
    group_by(name, value) %>%
    summarise(n = n(), .groups = "drop") %>%
    group_by(name) %>%
    mutate(perc = formattable::percent(n / sum(n))) %>%
    ungroup()
## A tibble: 10 × 4
#   name         value                 n perc      
#   <chr>        <chr>             <int> <formttbl>
# 1 education    elementary school     1 25.00%    
# 2 education    high school           2 50.00%    
# 3 education    university            1 25.00%    
# 4 occupation   doctor                1 25.00%    
# 5 occupation   student               1 25.00%    
# 6 occupation   teacher               1 25.00%    
# 7 occupation   unemployed            1 25.00%    
# 8 relationship divorced              1 25.00%    
# 9 relationship married               1 25.00%    
#10 relationship single                2 50.00%    

A few more comments:

  • I advise against using syntactically invalid column names such as %.
  • Avoid repetition. If you find yourself repeatedly pulling out data from a tibble/data.frame using filter and then applying the same calculation to those subsets, consider reshaping your data and using group_by to do the calculation once.
Maurits Evers
  • 49,617
  • 4
  • 47
  • 68
  • Thank you so much! this looks a lot more convenient. but I think this code only works after having a name column including all variables and other value columns. But I couldn't figure out how to change my tibble to that tibble structure. @Maurits Evers – dplyr Aug 29 '22 at 10:59
  • @dplyr This is based on the sample data you give. The `name` and `value` columns are created trough pivoting from wide to long. Provided your sample data is representative, this will work for larger/wider data too. There is no need to change anything to a particular "tibble structure" beyond making sure that data looks like your sample data. – Maurits Evers Aug 29 '22 at 13:21