How to split data into train set (and test set) every nrows in R?

Question

I've got a classification problem where I have a huge DATASET containing 308.500 data. I want to split these data into a train set and a test set in order to create a model.

But I want the train data to take, for example, sample for the DATASET every nrows, for example every 1.000 rows, so I know that the train set will be constructed by rows from all the DATASET. Is there a way to do this?

For example I'd like something like this:

train = DATASET[take sample every 1000 rows]

have you seen this similar [post](https://stackoverflow.com/questions/30885047/how-to-non-randomly-sample-every-n-rows-in-dplyr)? — mnm, May 16 '20 at 00:36

Ronak Shah · Accepted Answer · 2020-05-07T07:34:25.490

-1

You can use seq to create indices of rows to subset.

train_inds <- seq(1, nrow(DATASET), 1000)
train <- DATASET[train_inds, ]
test <- DATASET[-train_inds, ]

edited May 07 '20 at 07:34

answered May 07 '20 at 07:22

Ronak Shah

377,200
20
156
213

Thanks! And do you know how I take the rest of the data for the test set? – Giannis Lazaridis May 07 '20 at 07:31

How to split data into train set (and test set) every nrows in R?

1 Answers1