6

Thank you so much for your help in advance!

I am trying to modify an existing matrix such that, when a new line is added to the matrix, it removes values from the preexisting matrix.

For example, I have the matrix:

[,1] [,2] [,3] [,4]
 1     1    0    0
 0     1    0    0
 1     0    1    0
 0     0    1    1

I want to add another vector, I.vec, which has two values (I.vec=c(0,1,1,0)). This is easy enough to do. I just rbind it to the matrix. Now, for every column where I.vec is equal to 1, I want to randomly select a value from the other rows and make it zero. Ideally, this would end up with a matrix like:

[,1] [,2] [,3] [,4]
 1     0    0    0
 0     1    0    0
 1     0    0    0
 0     0    1    1
 0     1    1    0

But each time I run the iteration, I want it to randomly sample again.

So this is what I have tried:

mat1<-matrix(c(1,1,0,0,0,1,0,0,1,0,1,0,0,0,1,1),byrow=T, nrow=4)
I.vec<-c(0,1,1,0)
mat.I<-rbind(mat1,I.vec)
mat.I.r<-mat.I
d1<-mat.I[,which(mat.I[5,]==1)]
mat.I.r[sample(which(d1[1:4]==1),1),which(mat.I[5,]==1)]<-0

But this only deletes one of the two values I would like to delete. I have also tried variations on subsetting the matrix, but I have not been successful.

Thank you again!

Andrie
  • 176,377
  • 47
  • 447
  • 496
Laura
  • 679
  • 2
  • 5
  • 14

2 Answers2

5

There is a little bit of ambiguity in the description from the OP, so two solutions are suggested:

Assuming that only existing 1s in relevant columns can be set to 0

I'll just alter the original function (see below). The change is to the line defining rows. I now have (there was a bug in the original - the version below is revised to handle deal with the bug):

rows <- sapply(seq_along(cols), 
                   function(x, mat, cols) {
                       ones <- which(mat[,cols[x]] == 1L)
                       out <- if(length(ones) == 1L) {
                                  ones
                              } else {
                                  sample(ones, 1)
                       }
                       out
                   }, mat = mat, cols = cols)

Basically, what this does is, for each column we need to swap a 1 to a 0, we work out which rows of the column contain 1s and sample one of these.

Edit: We have to handle the case where there is only a single 1 in a column. If we just sample from a length 1 vector, R's sample() will treat it as if we wanted to sample from the set seq_len(n) not from the length 1 set n. We handle this now with an if, else statement.

We have to do this individually for each column so we get the correct rows. I suppose we could do some nice manipulation to avoid repeated calls to which() and sample(), but how escapes me at the moment, because we do have to handle the case where there is only one 1 in the column. Here's the finished function (updated to handle the length 1 sample bug in the original):

foo <- function(mat, vec) {
    nr <- nrow(mat)
    nc <- ncol(mat)

    cols <- which(vec == 1L)
    rows <- sapply(seq_along(cols), 
                   function(x, mat, cols) {
                       ones <- which(mat[,cols[x]] == 1L)
                       out <- if(length(ones) == 1L) {
                                  ones
                              } else {
                                  sample(ones, 1)
                              }
                       out
                   }, mat = mat, cols = cols)

    ind <- (nr*(cols-1)) + rows
    mat[ind] <- 0

    mat <- rbind(mat, vec)
    rownames(mat) <- NULL

    mat
}

and here it is in action:

> set.seed(2)
> foo(mat1, ivec)
     [,1] [,2] [,3] [,4]
[1,]    1    0    0    0
[2,]    0    1    0    0
[3,]    1    0    1    0
[4,]    0    0    0    1
[5,]    0    1    1    0

and it works when there is only one 1 in a column we want to do a swap in:

> foo(mat1, c(0,0,1,1))
     [,1] [,2] [,3] [,4]
[1,]    1    1    0    0
[2,]    0    1    0    0
[3,]    1    0    1    0
[4,]    0    0    0    1
[5,]    0    0    1    1

Original Answer: Assuming any value in a relevant column can be set to zero

Here is a vectorised answer, where we treat the matrix as a vector when doing the replacement. Using the example data:

mat1 <- matrix(c(1,1,0,0,0,1,0,0,1,0,1,0,0,0,1,1), byrow = TRUE, nrow = 4)
ivec <- c(0,1,1,0)

## Set a seed to make reproducible
set.seed(2)

## number of rows and columns of our matrix
nr <- nrow(mat1)
nc <- ncol(mat1)

## which of ivec are 1L
cols <- which(ivec == 1L)

## sample length(cols) row indices, with replacement
## so same row can be drawn more than once
rows <- sample(seq_len(nr), length(cols), replace = TRUE)

## Compute the index of each rows cols combination
## if we treated mat1 as a vector
ind <- (nr*(cols-1)) + rows
## ind should be of length length(cols)

## copy for illustration
mat2 <- mat1

## replace the indices we want with 0, note sub-setting as a vector
mat2[ind] <- 0

## bind on ivec
mat2 <- rbind(mat2, ivec)

This gives us:

> mat2
     [,1] [,2] [,3] [,4]
        1    0    0    0
        0    1    0    0
        1    0    0    0
        0    0    1    1
ivec    0    1    1    0

If I were doing this more than once or twice, I'd wrap this in a function:

foo <- function(mat, vec) {
    nr <- nrow(mat)
    nc <- ncol(mat)

    cols <- which(vec == 1L)
    rows <- sample(seq_len(nr), length(cols), replace = TRUE)

    ind <- (nr*(cols-1)) + rows
    mat[ind] <- 0

    mat <- rbind(mat, vec)
    rownames(mat) <- NULL

    mat
}

Which gives:

> foo(mat1, ivec)
     [,1] [,2] [,3] [,4]
[1,]    1    1    0    0
[2,]    0    1    0    0
[3,]    1    0    1    0
[4,]    0    0    0    1
[5,]    0    1    1    0

If you wanted to do this for multiple ivecs, growing mat1 each time, then you probably don't want to do that in a loop as growing objects is slow (it involves copies etc). But you could just modify the definition of ind to include the extra n rows you bind on for the n ivecs.

Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453
  • Very nice and afaik the fastest solution. – Joris Meys Jul 28 '11 at 12:51
  • Correct me if I'm wrong but with your last example the output still has 2 1's in the top 4 rows of the 2nd column. – nzcoops Jul 28 '11 at 23:36
  • @nzcoops the OP is unclear whether only a 1 at random has to be changed to 0 or is any random element in a relevant column needs to be set to zero. I'll provide an alternate version that only swaps 1s to 0s – Gavin Simpson Jul 29 '11 at 08:01
  • @nzcoops Updated with a version that only swaps 1s to 0s. – Gavin Simpson Jul 29 '11 at 11:38
  • @Gavin, true, I was going off the example which showed a 1 disappearing from both columns. Nice answer :) I couldn't think of a way out of the dreaded for loop :P Then again, if you're only doing it once on a small matrix and you can knock up a for loop quickly, guess there's no harm in that. – nzcoops Jul 31 '11 at 23:45
  • 1
    @Gavin Simpson - You could use `sample.int` instead to avoid problems: `out <- ones[sample.int(length(ones), 1)]` – Tommy Aug 12 '11 at 17:30
1

You could try something like this. Having 'nrow' in there will allow you to run it multiple times with other 'I.vec's. I tried to do this in a single line with 'apply' but couldn't get a matrix to come out again.

mat1<-matrix(c(1,1,0,0,0,1,0,0,1,0,1,0,0,0,1,1),byrow=T, nrow=4)
I.vec<-c(0,1,1,0)
mat.I.r<-rbind(mat1,I.vec)

for(i in 1:ncol(mat.I.r))
  {
  ifelse(mat.I.r[nrow(mat.I.r),i]==1, mat.I.r[sample(which(mat.I.r[1:(nrow(mat.I.r)-1),i]==1),1), i] <- 0, "")
  }
mat.I.r
nzcoops
  • 9,132
  • 8
  • 41
  • 52