I want to use manual inputs to a QAQC 'log file' to update an existing dataframe. The following log file would indicate date ranges (bounded by datetime_min and datetime_max) for particular variable (or 'all' of them) observations to be omitted from the dateframe (set to NA).
library(tidyverse)
library(lubridate)
QC_log <- tibble(
variable = c("SpCond", "pH", "pH", "all"),
datetime_min = ymd_hms(c("2021-06-01 18:00:00","2021-07-19 18:00:00","2021-08-19 18:00:00","2021-11-23 18:00:00")),
datetime_max = ymd_hms(c("2021-06-02 18:00:00","2021-07-25 21:00:00","2021-08-19 20:00:00","2021-11-26 05:00:00"))
)
The log should modify the following example of a dataframe, removing observations for each variable (for now I am not worried about 'all') that fall between the date min/max.
df <- tibble(
Datetime = ymd_hms(c("2021-06-01 17:00:00","2021-06-01 18:00:00","2021-06-01 19:00:00","2021-11-23 16:00:00","2021-11-23 17:00:00","2021-11-23 18:00:00")),
SpCond = c(220,225,224,230,231,235),
pH = c(7.8,7.9,8.0,7.7,7.8,7.7)
)
I have tried pmap like this:
df%>%
{pmap(QC_log, mutate(., ..1 = ifelse(Datetime > ..2 & Datetime < ..3, "NA", ..1)))}
I assumed pmap was taking ..1,2,3 from QC_log where ..1 is 'variable', ..2 is datetime_min, and ..3 is datetime_max, passing those as arguments into mutate one QC_log row at a time, which then conditionally replaces observations with NA if they fall into the specified date range.
I think I am having a hard time understanding ideas about non-standard evaluation/how arguments get passed through functions, among other things. Hopefully this is simple for now - I would like for this functionality to eventually be more complicated (e.g., changing all observations to NA when variable = 'all'; adding in separate actions like adding a data flag rather than omitting; or using a specific criterion (e.g., "<10") to omit observations rather than a daterange.