I am looking into using R's targets
but I am struggling to have it accept multiple file outputs.
For example, I want to be able to take a dataset, create a train/test split and write each dataset to a separate file.
An MWE would be
_targets.R
library(targets)
source("R/functions.R")
set.seed(124)
list(
# created using write.csv(mtcars, "data/mtcars.csv")
tar_target(raw_data, "data/mtcars.csv", format = "file"),
tar_target(data, read.csv(raw_data),
# this throws an error here:
tar_target(train_test, split_dataset(data), format = "file"),
# this only shows how I would try to use the train/test datasets
tar_target(model, train_model(train_test)),
tar_target(eval, eval_model(model, train_test))
)
where split_dataset()
is defined in R/functions.R
split_dataset <- function(data) {
idx <- sample.int(nrow(data), 0.8 * nrow(data))
train <- data[idx, ]
test <- data[-idx, ]
write.csv(train, "data/train.csv")
write.csv(test, "data/test.csv")
return(c("data/train.csv", "data/test.csv"))
}
One alternative would be to use a list list(train = train, test = test)
but I want to be able to access either dataset if possible and save the datasets as separate files.
Another alternative approach would be to define the index in the targets list, split the dataset and write each dataset in a separate target. If possible I would like to condense the steps into one (as shown above) to make the targets file easier to understand.