I started to use {drake} for a data production pipeline. The raw data I work with is quite large and is split up into ~130 separate (Stata) files. Thus, each file should be processed separately. In order to keep it readable, I use target()
, transform()
and map()
to specify my plan. This looks similar to the code below:
plan <- drake_plan(
dta_paths = list.files(my_folder, full.names = TRUE),
dfs = target(
read.dta13(dta_path),
transform = map(dta_path = dta_paths)
)
)
So when I make()
the plan, I get the following error:
target dfs_dta_paths
Warning: target dfs_dta_paths warnings:
the condition has length > 1 and only the first element will be used
the condition has length > 1 and only the first element will be used
the condition has length > 1 and only the first element will be used
fail dfs_dta_paths
Error: Target
dfs_dta_paths
failed. Calldiagnose(dfs_dta_paths)
for details. Error message:Expecting a single string value: [type=character; extent=129].
From what I understand from this warning and error messages, the mapping over the different file paths is not working and the full vector is passed to the first function call. I read https://books.ropensci.org/drake/static.html#map but it did not help in figuring out the problem. Also converting the vector of paths to a list did not help.
From How to combine multiple drake targets into a single cross target without combining the datasets? I got the idea of predefining a grid, which actually works as suggested. But since I do only need a vector, not a complex grid, this looks like over-engineering to me.
I feel like I'm missing something obvious, but I can't spot it. Any ideas what's wrong with my code?
I am aware of https://books.ropensci.org/drake/plans.html#how-to-choose-good-targets, but since I want to iterate in the process of data cleaning, I thought it would be helpful to create the dfs
target as shown above.