Calculate dates based on completion date and make a heatmap

Question

I have a dataset that looks like below

df <- data.frame(
  id = 1:6, # New id column
  drinks_on_previous_day1 = c(5,4,2,1,5,4),
  drinks_on_previous_day2 = c(5,4,2,1,5,5),
drinks_on_previous_day3 = c(5,4,2,1,5,6),
drinks_on_previous_day4 = c(5,4,2,1,5,7),
drinks_on_previous_day5 = c(5,4,2,1,5,8),
  completion_date = as.Date(c("2023-04-18 16:32:17", "2023-04-21 16:32:17", "2023-04-24 16:32:17", "2023-04-25 16:32:17", "2023-05-01 16:32:17", "2023-05-02 16:32:17"))
)

I have the drinks questions indicating the previous days, but I don't have exact dates. These would be different for each participant.

I have their completion date.

For each participant, First I'd like to calculate the dates based on their completion date. I am completely lost as to how to do that for each participant. Finally, I would like the data to be in a long format.

Here is what the dataframe should look like:

df_long <- df %>% 
  pivot_longer(
    cols = starts_with("drinks_on_previous_day"), # Select columns to pivot
    names_to = "day", # Name of the new column for day values
    values_to = "drinks" # Name of the new column for drinks values
  )

But instead of column names in the day column, I'd like to be the exact 5 dates for each participant based on their completion date.

Finally, I would like to plot a heatmap with the drinks on each day with one line for each participant with different colors for different number of drinks.

 >5 drinks -> red
 <5 drinks -> green

M-- · Accepted Answer · 2023-05-03T16:58:28.997

I am not quite sure about the plot, but we can add the dates to the data:

library(tidyverse)

df %>% 
  pivot_longer(
    cols = starts_with("drinks_on_previous_day"), # Select columns to pivot
    names_to = "day", # Name of the new column for day values
    values_to = "drinks" # Name of the new column for drinks values
  ) %>% 
  mutate(day = as.numeric(gsub("drinks_on_previous_day", "", day))) %>% 
  na.omit() %>% 
  mutate(drink_date = completion_date + day - max(day),
         drink_threshold = if_else(drinks >= 5, ">= 5", "< 5"),
         .by = id) -> df_long

#> # A tibble: 30 × 6
#>       id completion_date   day drinks drink_date drink_threshold
#>    <int> <date>          <dbl>  <dbl> <date>     <chr>          
#>  1     1 2023-04-18          1      5 2023-04-14 >= 5           
#>  2     1 2023-04-18          2      5 2023-04-15 >= 5           
#>  3     1 2023-04-18          3      5 2023-04-16 >= 5           
#>  4     1 2023-04-18          4      5 2023-04-17 >= 5           
#>  5     1 2023-04-18          5      5 2023-04-18 >= 5           
#>  6     2 2023-04-21          1      4 2023-04-17 < 5            
#>  7     2 2023-04-21          2      4 2023-04-18 < 5            
#>  8     2 2023-04-21          3      4 2023-04-19 < 5            
#>  9     2 2023-04-21          4      4 2023-04-20 < 5            
#> 10     2 2023-04-21          5      4 2023-04-21 < 5            
#> # ℹ 20 more rows

I created a heatmap-like plot using ggplot2:

ggplot(df_long) +
  geom_bar(aes(drink_date, id, fill = drink_threshold), 
           stat = "identity", color = "black", width=1) +
  facet_wrap(~id, ncol = 1, strip.position="left", scales = "free_y") +
  scale_x_date(date_breaks = "1 day", expand =c(0.01,0.01)) +
  scale_fill_manual(values=c("springgreen2", "firebrick2")) +
  theme_gray() +
  theme(axis.text.y = element_blank(),
        axis.title.y = element_text(angle = 0, vjust = 0.5),
        line = element_blank(),
        axis.text.x = element_text(angle = 90, vjust = 0.5),
        panel.spacing = unit(0, "lines"),
        strip.text.y.left = element_text(angle = 0),
        strip.background = element_rect(colour="white", fill="white"))

Calculate dates based on completion date and make a heatmap

1 Answers1