I have a workflow that I run against variations of essentially the same dataset (It's an emr extract, sometimes I run against iterations of the bulk extract, and sometimes against iterations of test extracts).
These datasets are (Supposed to be) homogenous, and have the same processing requirements in general.
That said, before I migrated the project to drake, a lot of the analysis had been performed on a subset of one of the test datasets, sometimes semi-interactively, with little guarantee of reproducibility.
Though generally across my datasets I don't wish to filter the dataset on the same criteria the analysts started from, for some datasets it's helpful in order to verify that the workflow is in fact producing the same results for the same input as the original analysis.
An example of the starting filter the analists may have used:
filter_extract_window <- function(df) {
start <- lubridate::dmy("01-04-2017")
end <- lubridate::dmy("30-06-2017")
df %>%
dplyr::filter(admit_dttm > start, admit_dttm < end) %>%
return()
}
A given dataset is stored fully separately to the project's code, in a directory tree that contains that datasets' drake_cache, and a subdirectory of the raw data.
My question is then - What's a nice way to import such a function into my workflow, without it being a statically declared import?