0

I have been trying to create a simple function with a two arguments in R that takes a dataset as an example and a categorical feature, and based on that specific feature, stores in a folder ("DATA") inside the parent working directory multiple csv files grouped by the categories in that feature.

The problem I have been facing is as simple as the function may be: I introduced non-standard evaluation with rlang, but multiple errors jump at you for the enquo parameter (either the symbol expected or not being a vector). Therefore, function always fails.

The portion of code I used is the following, assuming always everyone has a folder called "DATA" in the project in Rstudio to store the splitted csv files.

library(tidyverse)
library(data.table)
library(rlang)

csv_splitter <- function(df, parameter){
  
  df <- df
  
  # We set categorical features missing values vector, with names automatically applied with      
  # sapply. We introduce enquo on the parameter for non-standard evaluation.

  categories <- df %>% select(where(is.character)) 
  NA_in_categories <- sapply(categories, FUN = function(x) {sum(is.na(x))})
  parameter <- enquo(c(parameter))
  
  #We make sure such parameter is included in the set of categorical features

  if (!!parameter %in% names(NA_in_categories)) {
    df %>% 
        split(paste0(".$", !!parameter)) %>% 
        map2(.y = names(.), ~ fwrite(.x, paste0('./DATA/data_dfparam_', .y, '.csv')))
    print("The csv's are stored now in your DATA folder")
  } else {
    print("your variable is not here or it is continuous, buddy, try another one")
  }
}

With an error in either "arg must be a symbol" in the enquo parameter, or with parameter not being a vector (which in this portion of code is solved with the "c(parameter)", I am stuck and unable to apply any other change to solve it.

If anyone does have a suggestion, I'll be more than happy to try it out on my code. In any case, I'll be extremely grateful for your help!

ATB
  • 3
  • 1

0 Answers0