I am looking for a robust way to partition a dataset without using the sample()
function, and hope to get some feedback.
As a matter of fact, I'd ideally like to get rid of the of random
property inherent to the usage of sample()
samp<-data.frame(qldat) # convert zoo time-series object to data.frame
ind <- sample(2,nrow(samp),replace = TRUE, prob=c(0.8,0.2)) # splitting
#data series between training and test sets
tsamp<- samp[ind==1,] # training dataset
vsamp<- samp[ind==2,] # test set
Following some researches, I've figured out that subset()
could have helped, but it could involve a bit of hard-coding
the dataset. By hard-coding I mean for a 80:20 split(%) using nrow(samp)
, It's possible to subset the data from row=1
to row= 0.8 * nrow(samp)
for instance, acknowledging that it might not be a very efficient solution.
I've also tried createDataPartition()
, but it did not match my expectation since samp
does not hold any categorical data
on which I could rely on for the split (e.g createDataPartition(y=samp$categoricaldata,p=0.8, list=FALSE
)
PS: What I like in ind<-
is the inclusion of prob=c(0.8,0.2)
, thus the slice is sorted out automatically. Hence any similar idea without randomly splitting tsamp
&& vsamp
would be very appreciated.
Best,