I am trying to execute a function like the following to balance a train set with the package ROSE:
library(ROSE)
rose <- function(df){
str(df)
set.seed(124)
intrain <- sample(seq_len(nrow(df)), size = floor(0.7 * nrow(df)))
train <- df[intrain,]
train.rose <- ovun.sample(cls ~ ., data=train, N=nrow(train), p=0.5, seed=1, method="both")$data
return(train.rose)
}
data(hacide)
df <- rbind(hacide.train, hacide.test) # just to simulate a complete dataset
rose(df)
Calling the above script generates the following error message:
Error in terms.formula(formula, data = frml.env) :
'data' argument is of the wrong type
Instead, everything is fine when I call the ovun.sample(...)
function outside my local function rose
, that is:
library(ROSE)
data(hacide)
df <- rbind(hacide.train, hacide.test) # just to simulate a complete dataset
str(df)
set.seed(124)
intrain <- sample(seq_len(nrow(df)), size = floor(0.7 * nrow(df)))
train <- df[intrain,]
train.rose <- ovun.sample(cls ~ ., data=train, N=nrow(train), p=0.5, seed=1, method="both")$data
I understand the problem arises when calling the function ovun.sample(..., data=train,...)
inside rose() but I cannot figure out why. May it be a problem of environment variables?
Any idea?