0

I've a matrix (200x3) which i want to split into 3 random chosen disjoint sets. How can i realize it?

I tried to do it via sample method but sample method accepts just vectors and output is not really part of my matrix.

Thus, it is my matrix:

          X1           X2     Y
1   -3.381342627  1.037658397 0
2    3.329754336  1.964180648 0
3    1.760001645 -3.414310545 0
4   -2.450315854 -2.299838395 0
5   -3.334593596  0.069458604 0
6    1.708921101 -2.333932571 0
7   -2.650506645  0.348985289 0
8   -2.935307106 -0.402072990 0
9    2.867566309 -3.217712074 0
10   3.617603017  1.956535384 0

And i want to split in 3 sets like this: (row-numbers have to be random chosen). And i want to able to give the size of sets. For example in this case, 4 4 2.

9    2.867566309 -3.217712074 0
3    1.760001645 -3.414310545 0
1   -3.381342627  1.037658397 0
2    3.329754336  1.964180648 0


5   -3.334593596  0.069458604 0
8   -2.935307106 -0.402072990 0
4   -2.450315854 -2.299838395 0
6    1.708921101 -2.333932571 0


10   3.617603017  1.956535384 0
7   -2.650506645  0.348985289 0
Asqan
  • 4,319
  • 11
  • 61
  • 100
  • must the 3 random sets be of the same dimensions? – RJ- Aug 14 '13 at 02:23
  • Yes, they should all have 3 columns. thus a matrix – Asqan Aug 14 '13 at 02:31
  • and the rows will be equal? 200 is not divisible by 3 without a remainder – RJ- Aug 14 '13 at 03:06
  • I'm not positive what you want in terms of the number of rows in each set. Do you want a function that lets you specify the sizes or do you want them as equally sized as possible or are the sizes static? – David Aug 14 '13 at 03:39
  • my aim was to split data to different sets like training sets, test sets and so on. – Asqan Aug 14 '13 at 04:26
  • 1
    use `sample` to shuffle all row indices, then split that any way you like and select from original matrix – eddi Aug 14 '13 at 05:33

3 Answers3

3

Here is one way,

# a matrix with 3 columns
m <- matrix(runif(300), ncol=3)

# split into a list of dataframes (of course, you can convert back to matrices)
m_split <- split(as.data.frame(m), sample(1:3, size=nrow(m), replace=TRUE))

# count nr of rows
sapply(m_split, nrow)

# Or, as in the comment below, split by given number of rows per split
nsplit <- c(30,30,40)
m_split2 <- split(as.data.frame(m), rep(1:3, nsplit))
Remko Duursma
  • 2,741
  • 17
  • 24
0

I have solved it (may be not best way but solved) as follows:

nsamples= nrow(data)
//first take a random numbers; %40 of total number of samples
sampleInd = sample(nsamples,0.4*nsamples)
//construct first set via the half of taken indexes
valInd = sampleInd[1:floor(length(sampleInd)/2)]
valSet = dat[valInd,]
//other half
testInd = sampleInd[(floor(length(sampleInd)/2)+1):length(sampleInd)]
testSet = dat[testInd,]
//unused %60
trainSet = dat[-sampleInd,]
ntrain = nrow(trainSet)

Procents can be changed as you wish. The idea is thus dividing the matrix via function sample in terms of indices. Then using indices to take the actual matrices.

Asqan
  • 4,319
  • 11
  • 61
  • 100
0

The idea I mentioned in the comments:

# shuffle rows
rows = sample(nrow(m))

# split any way you like, e.g. 4/4/rest
rows.split = split(rows, c(rep(1,4), rep(2,4), rep(3,nrow(m) - 4 - 4)))

# subset the matrix
lapply(rows.split, function(x) m[x,])
eddi
  • 49,088
  • 6
  • 104
  • 155