1

In R, I'm using lmrob from the robustbase package to fit a simple linear model of the form:

lmrob(value ~ t + as.factor(r) + as.factor(c) + 0, data=subs, setting="KS2014")

This works fine 95% of the time, but every once in a while the call fails and gives this error:

Error: DGELS: weighted design matrix not of full rank (column XX).

where XX is varying column number. I can fix this by simply executing the lmrob command repeatedly until it finally succeeds -- usually this take 1-2 tries until it works. Note that I am not changing any of the inputs when I rerun lmrob.

Does anyone know of a setting I can change to avoid having to manually re-run the lmrob command to get it to work? I've tried changing some of the control parameters without success:

lm_control <- lmrob.control(setting="KS2014")
lm_control$max.it <- 1000
lm_control$nResample <- 1500
Good Eats
  • 333
  • 3
  • 14

2 Answers2

-1

This is simple. Wrap the statement in tryCatch and repeat it until it comes out clean. You may need to adjust the class of the result, see (str(class(result))).

pass <- FALSE

while (pass == FALSE) {
  result <- tryCatch(lmrob(value ~ t + as.factor(r) + as.factor(c) + 0, data=subs, setting="KS2014"),
           error = function(e) e, warning = function(w) w)

  if (!all(class(result) %in% c("error", "warning"))) {
    pass <- TRUE
  }
}

For fun, you can add a cap on how many times it can repeat it so you're not stuck in an infinite loop for crappy datasets.

Roman Luštrik
  • 69,533
  • 24
  • 154
  • 197
  • Thanks for the response Roman. I'm testing your solution now. So far it looks promising. However, I consider this approach to be a "workaround". I have a suspicion that there is likely a way to make lmrob perform as expected without resorting to workarounds. I'll wait a little bit to see if such a solution is posted by others. If not, I will accept your answer. Thanks again. – Good Eats Mar 27 '17 at 15:58
  • I finished testing the solution suggested above using a cap of 10 iterations to prevent infinite loops. Interestingly, if lmrob failed the first time, it would fail identically for all subsequent iterations. Based on this, I leaned that in order to get lmrob to succeed when re-run, it needs to be re-run in a new R instance. if I re-run a failed dataset in a new R instance it will succeed after 1 or 2 tries. – Good Eats Mar 27 '17 at 20:36
  • @GoodEats far out. Have you tried changing the controls? After each fail, you could perhaps change some parameters from `lmrob.control`. Sorry that I can't be of much us as to which parameters to change to what. I would approach this with trial and error. – Roman Luštrik Mar 28 '17 at 07:40
  • Thanks Roman. I have tried changing the lmrob.control object without succcess. I also tried changing the rand seed after each iteration of the loop above to see if that could simulate starting a new instance of R, but that didn't help either. In the end I settled on another hack which used a non-robust fit if the robust fit fails. Not satisfying at all -- but hey, it's R. – Good Eats Mar 30 '17 at 12:40
  • @GoodEats you have an interplay of code and data at hand. Not a trivial problem if it's of any consolation. :) – Roman Luštrik Mar 31 '17 at 07:56
-1

After reading more about robust linear regression, I think I better understand the source of the problem. As outlined in this paper and alluded to in the docs for lmrob.control, the first step of a robust regression involves sub sampling the input data. In cases with many categorical predictors, there is a higher likelihood that the sub-sample will contain co-linear columns which results in a matrix that is not full rank and hence the reported "DGELS" error. The "KS2011" and "KS2014" settings in lmrob allow you to specify that the algorithm should take extra care to avoid co-linear columns when picking a sub-sample, however in cases where the number of data points is not much bigger than the number of variables in the model (as is sometimes the case for my application), the algorithm still cannot find a non-singular data subset from the initial starting point and it still fails. This doesn't explain why restarting a new R session can help lmrob find a non-singular subset, but it does explain why this is a difficult problem that often throws errors.

Good Eats
  • 333
  • 3
  • 14