-2

I've got a classification problem where I have a huge DATASET containing 308.500 data. I want to split these data into a train set and a test set in order to create a model.

But I want the train data to take, for example, sample for the DATASET every nrows, for example every 1.000 rows, so I know that the train set will be constructed by rows from all the DATASET. Is there a way to do this?

For example I'd like something like this:

train = DATASET[take sample every 1000 rows]
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • have you seen this similar [post](https://stackoverflow.com/questions/30885047/how-to-non-randomly-sample-every-n-rows-in-dplyr)? – mnm May 16 '20 at 00:36

1 Answers1

-1

You can use seq to create indices of rows to subset.

train_inds <- seq(1, nrow(DATASET), 1000)
train <- DATASET[train_inds, ]
test <- DATASET[-train_inds, ]
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213