2

Possible Duplicate:
Randomly selecting values from an existing matrix after adding a vector (in R)

This is a follow up to my question from last week and can be found here. I wasn't sure if it was appropriate to post this question in the same place, or to post it as a new question.

Okay, last time I asked about randomly removing values from a matrix after binding a new vector to it. The answers were very useful, but I have found a bug when I am using a non square matrix. I have been running the code in a loop and taking the sum of the matrix each time to ensure that it is working properly, but I have found that the sum varies, which would imply that the code is sometimes selecting the wrong value in the matrix (I want it to only select and replace ones).

Here is the code:

mat1<-matrix(c(1,0,1,0, 0,1,1,1, 1,0,0,0, 1,0,0,1, 1,1,1,1, 0,0,0,1),byrow=F, nrow=4)
I.vec<-c(0,1,1,1,0,0)


foo <- function(mat, vec) {
nr <- nrow(mat)
nc <- ncol(mat)
cols <- which(vec == 1L)
rows <- sapply(seq_along(cols), 
    function(x, mat, cols) {
        ones <- which(mat[,cols[x]] == 1L)
        sample(ones, 1)
        }, mat = mat, cols = cols)
ind <- (nr*(cols-1)) + rows
mat[ind] <- 0
mat <- rbind(mat, vec)
rownames(mat) <- NULL
mat
}

set.seed(2)

for (j in 1:1000){                             #run this vector through the simulations
     I.vec2=sample(I.vec,replace=FALSE)       #randomize interactions
     temp=foo(mat1,I.vec2)                    #run foo function
     prop=sum(temp)
     print.table(prop)
     }

In this case, sometimes the sum of the matrix is 13 and sometimes it is 14, when it should always be = sum(mat1) = 13.

I've tried to pick apart the code, and I think everything is working correctly except for the rows function, which, admittedly, I do not fully understand.

Community
  • 1
  • 1
Laura
  • 679
  • 2
  • 5
  • 14
  • 1
    "follow-up questions" are generally not suitable to this site's format because they lead to localized solutions. This is _especially_ true when the follow-up presents a problem with the accepted answer. How else is someone who only sees your previous question supposed to know it didn't work as you expected? One of the awesome things about this site is that _you_ can help others by asking questions. – Joshua Ulrich Aug 04 '11 at 00:29
  • Sorry! Should I avoid posting the follow up altogether? I wasn't sure how to add it to the other question. – Laura Aug 04 '11 at 00:39
  • 2
    It also appears you have two accounts named "Laura" that should be merged together: http://stackoverflow.com/users/866870/laura and http://stackoverflow.com/users/874102/laura. I would ask a moderator to merge them for you by emailing team@stackoverflow.com and letting them know. – Chase Aug 04 '11 at 00:47
  • In this case, yes I would add to your original question with your non-square matrix and explain what isn't working about it. – Chase Aug 04 '11 at 00:48
  • I will vote to Close this, whilst the Q above is great follow-up and allowed me to see where the problem was relatively easily, we need to fix the original Q&A. Which I have now done. My Answer below is merely holding so others don't waste time trying to solve this until it gets closed. Can others commenting here also help to close please? TIA – Gavin Simpson Aug 04 '11 at 01:06
  • I add an answer to the original question. I now see Gavin is still awake... Well anyway, problem is the behaviour of "sample". Gavins solution is faster than mine, and maybe easier to follow. So mine's gone again. – Joris Meys Aug 04 '11 at 01:37

1 Answers1

1

The problem is a feature of sample(). I will update the original Q, but the problem is due to a single 1 being observed in a column in the candidate matrix. The code that generates rows is trying to sample from a set of 1. Unfortunately, I forgot that sample() has a feature that when the first argument is a vector of length 1, sample() treats it as if you wanted to sample from the set 1, ..., n where n was the value of the single element in the set youe really wanted to sample from.

A simple example illustrates the behaviour when the argument x to sample() is a length 1 vector:

> set.seed(1)
> replicate(10, sample(4, 1))
 [1] 2 2 3 4 1 4 4 3 3 1

Instinctively, these should all be 4, but they aren't because of this documented and well known feature.

Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453