-1

I am a new user of R. I need to split the dataset into two parts randomly. the first one containing 2000 obs as a training sample and the other one consisting of 1333 obs used for validation. The total numb of obs is 3333. How can I do it in R? Thank you very much indeed.

Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
Shikar
  • 1
  • 4
  • Other good dupes: [Randomly split data by criterion into training and test](http://stackoverflow.com/q/22518982/903061), [How to randomly split a data frame into smaller ones with given number of rows](http://stackoverflow.com/q/20041239/903061). – Gregor Thomas Feb 09 '16 at 22:51
  • 1
    Please leave the title as R not R Studio. RStudio is an editor that is popular for writing R code, but *the language is R* and it doesn't matter whether you use Vim, Emacs, Notepad, RStudio, Notepad++, Crimson Editor, Visual Studio, Eclipse, or anything else to write your R code. – Gregor Thomas Feb 09 '16 at 22:53

1 Answers1

0

When selecting things randomly, you'll generally want to use sample(...):

> trainingIndices = sample(c(TRUE, FALSE), nrow(dataset), replace = TRUE)
> testingIndices = !trainingIndices
> trainingSet = dataset[trainingIndices,]
> testingSet = dataset[trainingIndices,]
Señor O
  • 17,049
  • 2
  • 45
  • 47