0

I am writing a program that interacts with R using Python. Basically, I have some R libraries that I want to ingest into my Python code. After downloading rpy2, I define my R functions that I want to use in a separate .R file script.

The R function requires that we pass the formula to it for applying some oversampling technique. Below is the R function that I wrote:

WFRandUnder <- function(target_variable, other, train, rel, thr.rel, C.perc, repl){
    a <- target_variable
    b <- '~'
    form_begin <- paste(a, b, sep=' ')
    fmla <- as.formula(paste(form_begin, paste(other, collapse= "+")))
    undersampled = RandUnderRegress(fmla, train, rel, thr.rel, C.perc, repl)
    return(undersampled)
}

I am passing, from python, the target variable name, as well as a list containing all the other columns' names. As I want it to be as follows: my_target_variable ~ all other columns

However in these line:

a <- target_variable
    b <- '~'
    form_begin <- paste(a, b, sep=' ')
    fmla <- as.formula(paste(form_begin, paste(other, collapse= "+"))) 

The formula does not always get formulated if I have many columns in my data. What should I do to make it always work? I am concatenating all columns'names with a + operator.

Perl Del Rey
  • 959
  • 1
  • 11
  • 25
  • Instead of creating a new `formula` object, I'd subset the data.frame with just `other` and `target_variable`, leaving the same formula (`target_variable~.`). – nicola Feb 27 '20 at 11:13

1 Answers1

0

Thanks to @nicola, I was able to solve this problem by doing the following:

create_formula <- function(target_variable, other){
    # y <- target_variable
    # tilda <- '~'
    # form_begin <- paste(y, tilda, sep=' ')
    # fmla <- as.formula(paste(form_begin, paste(other, collapse= "+")))
    # return(fmla)
    y <- target_variable
    fmla = as.formula(paste(y, '~ .'))
    return(fmla)
}

I call this function from my python program using rpy2. This issues no problem because whenever we use this formula, we will be attaching the data itself to it, so it won't possess a problem. A sample code to demonstrate what I'm saying:

        if self.smogn:
            smogned = runit.WFDIBS(

                 # here is the formula call (get_formula is a python function that calls create_formula defined above in R)
                fmla=get_formula(self.target_variable, self.other),

                # here is the data 
                dat=df_combined,

                method=self.phi_params['method'][0],
                npts=self.phi_params['npts'][0],
                controlpts=self.phi_params['control.pts'],
                thrrel=self.thr_rel,
                Cperc=self.Cperc,
                k=self.k,
                repl=self.repl,
                dist=self.dist,
                p=self.p,
                pert=self.pert)

Perl Del Rey
  • 959
  • 1
  • 11
  • 25