0

I am trying to execute a function like the following to balance a train set with the package ROSE:

library(ROSE)

rose <- function(df){
  str(df)
  set.seed(124)
  intrain <- sample(seq_len(nrow(df)), size = floor(0.7 * nrow(df)))
  train <- df[intrain,]
  train.rose <- ovun.sample(cls ~ ., data=train, N=nrow(train), p=0.5, seed=1, method="both")$data
  return(train.rose)
}

data(hacide)
df <- rbind(hacide.train, hacide.test) # just to simulate a complete dataset

rose(df)

Calling the above script generates the following error message:

Error in terms.formula(formula, data = frml.env) : 
 'data' argument is of the wrong type 

Instead, everything is fine when I call the ovun.sample(...) function outside my local function rose, that is:

library(ROSE)

data(hacide)
df <- rbind(hacide.train, hacide.test) # just to simulate a complete dataset

str(df)
set.seed(124)
intrain <- sample(seq_len(nrow(df)), size = floor(0.7 * nrow(df)))
train <- df[intrain,]
train.rose <- ovun.sample(cls ~ ., data=train, N=nrow(train), p=0.5, seed=1, method="both")$data

I understand the problem arises when calling the function ovun.sample(..., data=train,...) inside rose() but I cannot figure out why. May it be a problem of environment variables?

Any idea?

s.dallapalma
  • 1,225
  • 1
  • 12
  • 35

1 Answers1

0

I executed the code without the set.seed(1234) and it worked for me, you should set a seed out of the function. Also, maybe you have some library activated which causes a confusion on R.

rose <- function(df){
  str(df)
  intrain <- sample(seq_len(nrow(df)), size = floor(0.7 * nrow(df)))
  train <- df[intrain,]
  train.rose <- ovun.sample(cls ~ ., data=train, N=nrow(train), p=0.5, seed=1, method="both")$data
  return(train.rose)
}

data(hacide)
df <- rbind(hacide.train, hacide.test) # just to simulate a complete dataset

set.seed(1234)
head(rose(df))
'data.frame':   1250 obs. of  3 variables:
 $ cls: Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
 $ x1 : num  0.2008 0.0166 0.2287 0.1264 0.6008 ...
 $ x2 : num  0.678 1.5766 -0.5595 -0.0938 -0.2984 ...
  cls         x1         x2
1   0 -0.2247632  0.6806409
2   0  0.3437585 -1.0202996
3   0 -1.0226182  1.9629034
4   0  0.7245372 -0.2494658
5   0 -0.8972314  0.2397664
6   0  0.3361091 -0.2661655

Also the str that appears reffers to the original df not about the transformation.

  • I cannot see how moving the seed out of the function or removing it can make the code work. Anyway, I guess it could be a problem of libraries as the problem is still alive. Thanks for the reply, btw! – s.dallapalma Aug 10 '19 at 18:55