2

I am having some trouble with the caret package. I am new to R and I am trying to make a multiple linear regression model. I need to split my data into the testing and training set. I tried to use the caret package createDataPartition, but I get an error. I have a dataframe with 7 variables (cols) and 511 samples (rows).

train <- createDataPartition(mydata, p=0.8, list = FALSE, times = 1)

the error

Error in table(y) : attempt to make a table with >= 2^31 elements 
jmuhlenkamp
  • 2,102
  • 1
  • 14
  • 37
sprc
  • 21
  • 1
  • 3
  • Please add `dput(head(df))` with df your data.frame – loki Aug 19 '17 at 18:33
  • 1
    According to the documentation, `y` should be a vector, not a data.frame. – Roman Luštrik Aug 19 '17 at 18:34
  • Thank you both for responding! I think the problem was I used my data.frame and not a vector (ex. mydata$myvariable). The data that I am using to build my model is numeric, but does it matter which variable I use to split the data? – sprc Aug 19 '17 at 20:38
  • 1
    Split the data using the outcome variable (`y` as per the documentation). I.e. if your regression model is in the form y~x1+x2+...+x6, use `createDataPartition(mydata$y, ...)` Also, any particular reason why you are choosing your list argument as `list = FALSE` rather than the default `list = TRUE`? – Z.Lin Aug 20 '17 at 03:21

1 Answers1

2

To start createDataPartition function you need to indicate any column of your dataset, so:

train <- createDataPartition(mydata$anycolumn, p=0.8, list = FALSE, times = 1)
pasqual.en
  • 21
  • 2