2

I have a dataset containing COVID-19 patients with vaccination status and whether they're dead or alive.

ID <- c(1:20)
Group <- c("1. vacc + unvacc", "2. vacc", "3. vacc", "1. vacc + unvacc", "2. vacc", "3. vacc", "1. vacc + unvacc", "2. vacc", "3. vacc",
           "1. vacc + unvacc", "2. vacc", "3. vacc", "1. vacc + unvacc", "2. vacc", "3. vacc", "1. vacc + unvacc", "2. vacc", "3. vacc",
           "1. vacc + unvacc", "2. vacc")
Status <- c("Dead", "Alive", "Dead", "Alive", "Dead", "Alive", "Dead", "Alive", "Dead", "Alive", "Dead", "Alive", 
            "Dead", "Alive", "Dead", "Alive", "Dead", "Alive", "Dead", "Alive")

df <- data.frame(ID, Group, Status)

So far, I've tried to make a code, and I can come as far as this:

library(tidyverse)

df_organ %>% 
  mutate_at("Group", as.character) %>%
  list(group_by(.,Group, Status), .) %>%
  map(~summarize(.,cnt = n())) %>%
  bind_rows() %>%
  replace_na(list(Group="Overall"))

Giving me the output:

    `summarise()` has grouped output by 'Group'. You can override using the `.groups` argument.
# A tibble: 7 x 3
# Groups:   Group [4]
  Group            Status   cnt
  <chr>            <chr>  <int>
1 1. vacc + unvacc Alive      3
2 1. vacc + unvacc Dead       4
3 2. vacc          Alive      4
4 2. vacc          Dead       3
5 3. vacc          Alive      3
6 3. vacc          Dead       3
7 Overall          NA        20

The output I'm looking for is this:

    `summarise()` has grouped output by 'Group'. You can override using the `.groups` argument.
    # A tibble: 10 x 3
    # Groups:   Group [4]
      Group            Status   cnt
      <chr>            <chr>  <int>
    1 1. vacc + unvacc Alive      3
    2 1. vacc + unvacc Dead       4
    3 1. uvac + unvacc All        7
    4 2. vacc          Alive      4
    5 2. vacc          Dead       3
    6 2. vacc          All        7
    5 3. vacc          Alive      3
    6 3. vacc          Dead       3
    7 3. vacc          All        6
    8 Overall          Alive     10
    9 Overall          Dead      10
   10 Overall          All       20 
Nick Meier
  • 33
  • 4

1 Answers1

1

We could do it this way:

  1. First we count. We use count function from dplyr. The good thing about count is that it inherits group_by and summarise.
  2. Then we make wide format with pivot_wider from tidyr package
  3. Next we use handy janitor package to get rowsums and colsums. (We could do this also with base ...)
  4. Then get back to long format with renaming the columns
library(dplyr)
library(tidyr)
library(janitor)

df %>% 
  count(Group, Status) %>% 
  pivot_wider(
    names_from = Status,
    values_from = n
  ) %>% 
  adorn_totals("col", name = "All") %>% 
  adorn_totals("row", name = "Ovreall") %>% 
  pivot_longer(
    cols= -Group,
    names_to = "Status", 
    values_to = "cnt"
  )
   Group            Status   cnt
   <chr>            <chr>  <dbl>
 1 1. vacc + unvacc Alive      3
 2 1. vacc + unvacc Dead       4
 3 1. vacc + unvacc All        7
 4 2. vacc          Alive      4
 5 2. vacc          Dead       3
 6 2. vacc          All        7
 7 3. vacc          Alive      3
 8 3. vacc          Dead       3
 9 3. vacc          All        6
10 Ovreall          Alive     10
11 Ovreall          Dead      10
12 Ovreall          All       20
TarJae
  • 72,363
  • 6
  • 19
  • 66
  • Thank you @TarJae! I've done a mistake, and not informed you about the fact that I also need to summarise the median values for this variable: Time from last vaccine to ICU admission: Time <- c(2, 3, 4, 2, 1, 5, 2, 1, 2, 3, 2, 3, 4, 2, 1, 5, 2, 1, 2, 3) #incorporated into the same dataframe. Since I am not at all familiar with the syntax (I am fairly new to dplyr). How would I go about finding the median values for each of the 12 groups and the quartiles too? Once again, thank you so much for your help, I've accepted your answer and marked it green. Merry Christmas! Nick – Nick Meier Dec 24 '21 at 23:02