1

I have the following function that I need to (m)apply on a list of more than 1500 large matrices (Z) and a list of vectors (p) of the same length. However, I get the error that some matrices are singular as I already posted here. Here my function:

kastner <- function(item, p) { print(item)
                               imp <- rowSums(Z[[item]])
                               exp <- colSums(Z[[item]])
                               x = p + imp
                               ac = p + imp - exp  
                               einsdurchx = 1/as.vector(x)    
                               einsdurchx[is.infinite(einsdurchx)] <- 0
                               A = Z[[item]] %*% diag(einsdurchx)
                               R = solve(diag(length(p))-A) %*% diag(p)    
                               C = ac * einsdurchx
                               R_bar = diag(as.vector(C)) %*% R
                               rR_bar = round(R_bar)
                               return(rR_bar)
}

and my mapply command that also prints the names of the running matrix:

KASTNER <- mapply(kastner, names(Z), p, SIMPLIFY = FALSE) 

In order to overcome the singularity problem, I want to add a small amount of jitter the singular matrices. The problem starts in line 9 of the function R = solve(diag(length(p))-A) %*% diag(p) as this term(diag(length(p))-A) gets singular and can't be solved. I tried to add jitter to all Z matrices in the first line of the function using: Z <- lapply(Z,function(x) jitter(x, factor = 0.0001, amount = NULL)), but this is very very low and produces still errors.

Therefore my idea is to check with if/else or something similar if this matrix diag(length(p))-A is singular (maybe using eigenvectors to check collinearity) and add on those matrices jitter, else (if not) the solve command should performed as it is. Ideas how to implement this on the function? Thanks

Here some example data, although there is no problem with singularity as I was not able to rebuild this error for line 9:

Z <- list("111.2012"= matrix(c(0,0,100,200,0,0,0,0,50,350,0,50,50,200,200,0),
                                 nrow = 4, ncol = 4, byrow = T),
          "112.2012"= matrix(c(10,90,0,30,10,90,0,10,200,50,10,350,150,100,200,10),
                                  nrow = 4, ncol = 4, byrow = T))
p <- list("111.2012"=c(200, 1000, 100, 10), "112.2012"=c(300, 900, 50, 100))

Edit: a small amount o jitter shouldn't be problematic in my data as I have probably more than 80% of zeros in my matrices and than large values. And I am only interested in those large values, but the large amount of 0s are probably the reason for the singularity, but needed.

Community
  • 1
  • 1
N.Varela
  • 910
  • 1
  • 11
  • 25

1 Answers1

2

Since you didn't provide a working example I couldn't test this easily, so the burden of proof is on you. :) In any case, it should be a starting point for further tinkering. Comments in the code.

kastner <- function(item, p) { print(item)
  imp <- rowSums(Z[[item]])
  exp <- colSums(Z[[item]])
  x = p + imp
  ac = p + imp - exp  
  einsdurchx = 1/as.vector(x)    
  einsdurchx[is.infinite(einsdurchx)] <- 0

  # start a chunk that repeats until you get a valid result
  do.jitter <- TRUE # bureaucracy
  while (do.jitter == TRUE) {
    # run the code as usual
    A = Z[[item]] %*% diag(einsdurchx)
    # catch any possible errors, you can even catch "singularity" error here by
    # specifying error = function(e) e
    R <- tryCatch(solve(diag(length(p))-A) %*% diag(p), error = function(e) "jitterme")
    # if you were able to solve(), and the result is a matrix (carefuly if it's a vector!)...
    if (is.matrix(R)) {
      # ... turn the while loop off
      do.jitter <- FALSE
    } else {
      #... else apply some jitter and repeat by construcing A from a jittered Z[[item]]
      Z[[item]] <- jitter(Z[[item]])
    }
  }
  C = ac * einsdurchx
  R_bar = diag(as.vector(C)) %*% R
  rR_bar = round(R_bar)
  return(rR_bar)
}
Roman Luštrik
  • 69,533
  • 24
  • 154
  • 197
  • Unfortunaly I can not really test if its wright as it take too much time running. Using microbenchmark the script is around four times slower than the old one for this small example. Do you have an idea how to reduce computing time? – N.Varela Dec 17 '15 at 11:21
  • Four times slower - is that in milliseconds? :) – Roman Luštrik Dec 17 '15 at 13:34
  • nanoseconds: ~25 (original) to ~95 (your solution), for the small example data I deliver in this question. For my original data I broke after ~10min as it calculates only ~50 matrices (of 1500) and I got out of memory – N.Varela Dec 17 '15 at 13:37
  • @N.Varela It would be best if you address the memory issue first, and worry about performance later. – Roman Luštrik Dec 17 '15 at 13:42
  • I only have memory problems with the mapply() command on this function, as I delete all other objects besides Z and p form working space. Any ideas? – N.Varela Dec 17 '15 at 13:50
  • @N.Varela please address this issue in a separate question. Don't forget a minimal working example. – Roman Luštrik Dec 17 '15 at 13:54