My problem is conceptually simple. I am looking for a computationally efficient solution of it (my own one I attach at the end).
Suppose we have a potentially very large sparse matrix like the one on the left below and want to 'name' every area of contiguous non-zero elements with a separate code (see matrix on the right)
1 1 1 . . . . . 1 1 1 . . . . .
1 1 1 . 1 1 . . 1 1 1 . 4 4 . .
1 1 1 . 1 1 . . 1 1 1 . 4 4 . .
. . . . 1 1 . . ---> . . . . 4 4 . .
. . 1 1 . . 1 1 . . 3 3 . . 7 7
1 . 1 1 . . 1 1 2 . 3 3 . . 7 7
1 . . . 1 . . . 2 . . . 5 . . .
1 . . . . 1 1 1 2 . . . . 6 6 6
In my application the contiguous elements would form rectangles, lines or single points and they can only touch each other with the vertexes (i.e. there would be no irregular/non rectangular areas in the matrix).
The solution I imagined is to match the row and column indexes of the sparse matrix representation to a vector with the appropriate values (the 'name' codes). My solution uses several for loops
and works fine for small to medium matrices but will fast get stuck in the loops as the dimensions of the matrix become large (>1000). It probably depends from the fact I'm not so advanced in R programming - I couldn't find any computational trick/function to solve it better.
Can anybody suggest a computationally more efficient way to do that in R?
My solution:
mySolution <- function(X){
if (class(X) != "ngCMatrix") {stop("Input must be a Sparse Matrix")}
ind <- which(X == TRUE, arr.ind = TRUE)
r <- ind[,1]
c <- ind[,2]
lr <- nrow(ind)
for (i in 1:lr) {
if(i == 1) {bk <- 1}
else {
if (r[i]-r[i-1] == 1){bk <- c(bk, bk[i-1])}
else {bk <- c(bk, bk[i-1]+1)}
}
}
for (LOOP in 1:(lr-1)) {
tr <- r[LOOP]
tc <- c[LOOP]
for (j in (LOOP+1):lr){
if (r[j] == tr) {
if(c[j] == tc + 1) {bk[j] <- bk[LOOP]}
}
}
}
val <- unique(bk)
for (k in 1:lr){
bk[k] <- which(val==bk[k])
}
return(sparseMatrix(i = r, j = c, x = bk))
}
Thanks in advance for any help or pointer.