I am writing a program that interacts with R using Python. Basically, I have some R libraries that I want to ingest into my Python code. After downloading rpy2
, I define my R functions that I want to use in a separate .R
file script.
The R function requires that we pass the formula to it for applying some oversampling
technique. Below is the R function that I wrote:
WFRandUnder <- function(target_variable, other, train, rel, thr.rel, C.perc, repl){
a <- target_variable
b <- '~'
form_begin <- paste(a, b, sep=' ')
fmla <- as.formula(paste(form_begin, paste(other, collapse= "+")))
undersampled = RandUnderRegress(fmla, train, rel, thr.rel, C.perc, repl)
return(undersampled)
}
I am passing, from python, the target variable name, as well as a list containing all the other columns' names. As I want it to be as follows:
my_target_variable ~ all other columns
However in these line:
a <- target_variable
b <- '~'
form_begin <- paste(a, b, sep=' ')
fmla <- as.formula(paste(form_begin, paste(other, collapse= "+")))
The formula does not always get formulated if I have many columns in my data. What should I do to make it always work? I am concatenating all columns'names with a +
operator.