I have been trying to create a simple function with a two arguments in R that takes a dataset as an example and a categorical feature, and based on that specific feature, stores in a folder ("DATA") inside the parent working directory multiple csv files grouped by the categories in that feature.
The problem I have been facing is as simple as the function may be: I introduced non-standard evaluation with rlang, but multiple errors jump at you for the enquo parameter (either the symbol expected or not being a vector). Therefore, function always fails.
The portion of code I used is the following, assuming always everyone has a folder called "DATA" in the project in Rstudio to store the splitted csv files.
library(tidyverse)
library(data.table)
library(rlang)
csv_splitter <- function(df, parameter){
df <- df
# We set categorical features missing values vector, with names automatically applied with
# sapply. We introduce enquo on the parameter for non-standard evaluation.
categories <- df %>% select(where(is.character))
NA_in_categories <- sapply(categories, FUN = function(x) {sum(is.na(x))})
parameter <- enquo(c(parameter))
#We make sure such parameter is included in the set of categorical features
if (!!parameter %in% names(NA_in_categories)) {
df %>%
split(paste0(".$", !!parameter)) %>%
map2(.y = names(.), ~ fwrite(.x, paste0('./DATA/data_dfparam_', .y, '.csv')))
print("The csv's are stored now in your DATA folder")
} else {
print("your variable is not here or it is continuous, buddy, try another one")
}
}
With an error in either "arg must be a symbol" in the enquo parameter, or with parameter not being a vector (which in this portion of code is solved with the "c(parameter)", I am stuck and unable to apply any other change to solve it.
If anyone does have a suggestion, I'll be more than happy to try it out on my code. In any case, I'll be extremely grateful for your help!