0

I am constructing bootstrap function that will estimate logistic regression on each bootstrapped data set and do some additional calculations. For small samples sometimes it may produce warnings such as

Warning messages:
1: glm.fit: algorithm did not converge 
2: glm.fit: fitted probabilities numerically 0 or 1 occurred

I would like to indicate such warnings for each bootstrap iteration. Here is my code:

library(parallel)

n=length(y);
dataN=as.matrix((cbind(y,data)));
cl <- makeCluster(detectCores())
clusterExport(cl,c("dataN"), envir=environment())
clusterExport(cl,c("n"), envir=environment())
clusterEvalQ(cl,library(glmnet))
q=2000;
repl1=parLapply(cl=cl,1:q,function(i,dataA=dataN,smpl=n,...){
  dataB <- dataA[sample(nrow(dataA),size=smpl,replace=TRUE),]
  p=dim(dataB)[2];p
  forname=c(colnames(dataB)[2:p]);
  formulaZS=as.formula(paste("y~",paste(forname,collapse="+"),sep=""));
  assign("last.warning", NULL, envir = baseenv())
  m=glm(formulaZS,family=binomial,data=data.frame(dataB))
  coefs=summary(m)$coef[,1];coefs
  a=length(attr(warnings(),"names"))
  indicator=ifelse(a==0,0,1);
  return(list(coefs=coefs,indicator=indicator))
})  
stopCluster(cl)

beta=t(matrix(unlist(lapply(repl1, "[[", "coefs")),nrow=p+1));
beta.est=apply(beta,2,mean);beta.est
indicator=t(matrix(unlist(lapply(repl1, "[[", "indicator")),nrow=1));
sum(indicator)

Before each regression estimation I'm resetting warnings by

assign("last.warning", NULL, envir = baseenv())

and then checking if there was warning by calculating the length of warnings() names. This works if I'm doing it manually, without application of parLapply. However, if I'm running this function it produces the vector indicator with only zeroes even if there were warnings.

Data generating code:

fundata=function(n,p,corr){
      R = matrix(rep(corr,p*p),nrow=p)+(1-corr)*diag(p);
      R <- round(((R * lower.tri(R)) + t(R * lower.tri(R))),2)
      diag(R) <- 1 
      U = t(chol(R))
      nvars = dim(U)[1]
      random.normal = matrix(rnorm(nvars*n,mean=0,sd=1), nrow=nvars, ncol=n);
      X = U %*% random.normal
      newX = t(X)
      data = as.data.frame(newX)
      names(data)<-sprintf("V%d",1:p)
      return(data=data) 
}
n=50;
p=5;
corr=0.5;
seed=123;
#Generate independent variables
data=fundata(n,p,corr);
#Generate dependent variable
p=dim(data)[2];p
  true=c(-1,0,0,0.2,0.675,-1.5)
  z=as.matrix(cbind(rep(1,n),data[,1:p]))%*%true;
  pr = 1/(1+exp(-z));
  y = rbinom(n=n, size=1, prob=t(pr))
  dataN=as.matrix((cbind(y,data)))
Art
  • 43
  • 8
  • A work around for your specific situation is to check if any of the coefs are larger than 12-15 in absolute terms. On the logit scale, this corresponds to about infinity, indicating that the model has run into problems. – Hong Ooi Sep 04 '17 at 23:03
  • Can you provide the data so that we can run your code? – F. Privé Sep 05 '17 at 08:26
  • @F.Privé Sure. I have added it. – Art Sep 07 '17 at 02:06
  • @HongOoi I did that before, but it is very inaccurate solution. I would like to know in general if it is possible to track warnings inside of parallel programming functions, tag them and respond with a suitable strategy. – Art Sep 07 '17 at 02:06
  • Error in matrix(rep(corr, p * p), nrow = p) : argument "corr" is missing, with no default – F. Privé Sep 07 '17 at 06:09
  • @F.Privé Fixed. The original code is much longer, so I tried to leave only the necessary code and forgot to define correlation. But you can put any non-zero correlation smaller than |0.7|...it doesn't really matter. – Art Sep 08 '17 at 01:06
  • Error in as.matrix(cbind(rep(1, n), data[, 1:p])) %*% true : non-conformable arguments. Is it really that difficult to open a new session and try running your code before posting it here? – F. Privé Sep 08 '17 at 06:52
  • @F.Privé Fixed. – Art Sep 08 '17 at 18:36

0 Answers0