0

I have several columns with information on household gender composition (10 variables total). I want to count the number of males in each household.

Head of the dataset:

   gndr  gndr2  gndr3  gndr4  gndr5  gndr6  gndr7  gndr8  gndr9 gndr10 
1   Male Female   <NA>   <NA>   <NA>   <NA>   <NA>   <NA>   <NA>   <NA>   
2  Female  Male Female   Male   Male   Male   Male   <NA>   <NA>   <NA>   
3  Female  Male Female   <NA>   <NA>   <NA>   <NA>   <NA>   <NA>   <NA>   
4   Male   <NA>   <NA>   <NA>   <NA>   <NA>   <NA>   <NA>   <NA>   <NA>   
5   Male Female   Male Female Female   Male   Male   <NA>   <NA>   <NA>   

I want to create a table that has the number of households with no male, with one male, with two males, and so on.

Is there any code in the dplyr and tidyr packages that can do that?

M--
  • 25,431
  • 8
  • 61
  • 93
Alice
  • 41
  • 1
  • 4
  • 3
    Try `table(rowSums(res=="Male",na.rm=TRUE))` (assuming `res` is your object). – nicola Sep 10 '19 at 15:12
  • 2
    `d %>% gather(var, val) %>% group_by(var) %>% table` – d.b Sep 10 '19 at 15:18
  • ```sapply(df1, function(x) table(factor(x, levels=c("Female", "Male", "NA"))))``` – M-- Sep 10 '19 at 15:39
  • @nicola why I am getting this ```#[line1] 1 4 5 #[line2] 3 1 1``` While I was expecting something like this: ```3 2 1 1 1 2 2 0 0 0``` – M-- Sep 10 '19 at 15:41
  • @M-- Please edit the question with the desired output. If I got it correctly, it suffices `colSums(res=="Male",na.rm=TRUE)`. Do you want the number of `Male` for each column, correct? – nicola Sep 11 '19 at 07:45
  • @nicola I'm not the OP, but I think question is clear, and now answered and closed. – M-- Sep 11 '19 at 12:58

1 Answers1

0

This is a pretty standard use case where data was collected in "wide" form and is best worked with in "long" form. The difference is that in long form you only have one column for gender and another for which individual that gender belongs to. We'll use tidyr::gather to wrangle it into long form and the use dplyr to summarize the number of households with 1, 2, 3, ... men.

library(dplyr)
library(tidyr)

wide.df <- tribble(
  ~gndr,  ~gndr2,  ~gndr3,  ~gndr4,  ~gndr5,  ~gndr6,  ~gndr7,  ~gndr8,  ~gndr9, ~gndr10,
  "Male", "Female", NA, NA, NA, NA, NA, NA, NA, NA,   
  "Female", "Male", "Female", "Male", "Male", "Male", "Male", NA, NA, NA,   
  "Female", "Male", "Female", NA, NA, NA, NA, NA, NA, NA,   
  "Male", NA, NA, NA, NA, NA, NA, NA, NA, NA,   
  "Male", "Female", "Male", "Female", "Female", "Male", "Male", NA, NA, NA 
)

wide.df %>%
  mutate(household = 1:nrow(.)) %>%
  gather(key = "individual", value = "gender", -household) %>%
  mutate(individual = factor(individual),
         gender = factor(gender)) %>%
  filter(gender == "Male") %>%
  group_by(household) %>%
  summarize(males = n()) %>%
  arrange(desc(males)) %>%
  group_by(males) %>%
  summarize(households = n())

# # A tibble: 3 x 2
# males households
# <int>      <int>
# 1          3
# 4          1
# 5          1

Or if you want to count the numbers of households with males and females, then it's just an additional grouping.

wide.df %>%
  mutate(household = 1:nrow(.)) %>%
  gather(key = "individual", value = "gender", -household) %>%
  mutate(individual = factor(individual),
         gender = factor(gender)) %>%
  filter(!is.na(gender)) %>%
  group_by(household, gender) %>%
  summarize(count = n()) %>%
  group_by(gender, count) %>%
  summarize(households = n()) %>%
  arrange(count)

# # A tibble: 6 x 3
# # Groups:   gender [2]
# gender count households
# <fct>  <int>      <int>
# Female     1          1
# Male       1          3
# Female     2          2
# Female     3          1
# Male       4          1
# Male       5          1
Gregory
  • 4,147
  • 7
  • 33
  • 44
  • did you see *d.b*'s comment? ```wide.df %>% gather(var, val) %>% group_by(var) %>% table %>% as.data.frame() %>% filter(val == "Male")``` this works if you want Male only, and it is much more straightforward. – M-- Sep 10 '19 at 17:30
  • 1
    @M-- I hadn't seen that, and you're right that it is more straightforward. I modified my code to summarize both genders anyway, in case that was OPs intent. – Gregory Sep 10 '19 at 17:32