0

I have this dataset for items at a picnic that contributed to a foodborne illness outbreak. Yes or no in the 'sick' column determines whether the patient got sick after the picnic. Yes or no after a food item (apple, banana, strawberry, yogurt, or sandwich) determines whether or not the patient ate that particular food.

    patient sick apple banana strawberry yogurt sandwich
1: patient1  yes    no     no        yes     no       no
2: patient2  yes   yes     no         no    yes       no
3: patient3   no   yes    yes        yes    yes      yes
4: patient4  yes    no     no        yes    yes       no
5: patient5   no    no     no         no    yes       no
6: patient6   no    no     no         no     no       no
7: patient7  yes    no     no         no    yes      yes

I used the group_by() and summarise() functions to create a summary of who ate the food and got sick, who ate the food and didn't get sick, who didn't eat the food and got sick, and who didn't eat the food and didn't get sick.

sick_apple <- picnic %>% 
  group_by(sick, apple) %>% 
  summarise(total=n())
sick_apple

This is the output for that code.

# A tidytable: 4 × 3
# Groups:      sick
  sick  apple total
  <chr> <chr> <int>
1 no    no        2
2 no    yes       1
3 yes   no        3
4 yes   yes       1

How can I extract the value for "yes" "yes" for each food and make it into a list? In other words, how could I calculate these counts for every food item (apple, banana, strawberry, yogurt, or sandwich) and then extract the values for sick=='yes' and food (apple, banana, strawberry, yogurt or sandwich)=='yes'.

If possible, I'd like to have a table that looks something like this, where column B (frequency that age food and got sick) tallies the number of patients where sick=="yes" and the particular food =="yes".

enter image description here

Matt
  • 25
  • 4

1 Answers1

1

You can transform the data into long format, then summarise:

library("tidyr")
library("dplyr")
picnic_long <- pivot_longer(picnic, cols = 3:ncol(picnic), 
    names_to = "food",
    values_to = "eaten")
picnic_long %>% 
    filter(sick == "yes" & eaten =="yes") %>% 
    group_by(food) %>% 
    summarise(frequency = n())

Output:

# A tibble: 4 × 2
  food       frequency
  <chr>          <int>
1 apple              1
2 sandwich           1
3 strawberry         2
4 yogurt             3
Cloudberry
  • 240
  • 2
  • 8
  • Thanks a lot @Cloudberry! This is really helpful and worked perfectly! Quick question, in the pivot_longer() function, does the code `cols=3:ncol(picnic)` clarify that the 3rd column apple all the way to the last column are the ones to turn into the food column? – Matt Apr 17 '23 at 14:21
  • @Matt Yes, the `cols` parameter defines which columns are pivoted and `3:ncol(picnic)` is the range from (and including) the third to the last column of the dataframe. – Cloudberry Apr 17 '23 at 19:00