I am a new user of R. I need to split the dataset into two parts randomly. the first one containing 2000 obs as a training sample and the other one consisting of 1333 obs used for validation. The total numb of obs is 3333. How can I do it in R? Thank you very much indeed.
Asked
Active
Viewed 1,001 times
-1
-
Other good dupes: [Randomly split data by criterion into training and test](http://stackoverflow.com/q/22518982/903061), [How to randomly split a data frame into smaller ones with given number of rows](http://stackoverflow.com/q/20041239/903061). – Gregor Thomas Feb 09 '16 at 22:51
-
1Please leave the title as R not R Studio. RStudio is an editor that is popular for writing R code, but *the language is R* and it doesn't matter whether you use Vim, Emacs, Notepad, RStudio, Notepad++, Crimson Editor, Visual Studio, Eclipse, or anything else to write your R code. – Gregor Thomas Feb 09 '16 at 22:53
1 Answers
0
When selecting things randomly, you'll generally want to use sample(...)
:
> trainingIndices = sample(c(TRUE, FALSE), nrow(dataset), replace = TRUE)
> testingIndices = !trainingIndices
> trainingSet = dataset[trainingIndices,]
> testingSet = dataset[trainingIndices,]

Señor O
- 17,049
- 2
- 45
- 47