Apologies, since this question is somewhat vague and general, and is certainly not reproducible since the code is too complex. However, I suspect it could be answered by equally vague strategies of approaching these issues that are instructive and helpful.
I have coded a simulator which has a main, parallelized loop iterating through parameter values, loading them to the model and running them n
times.
The issue: while the code generally works well for smaller problem dimensions, it fails at a significant frequency at higher dimensions (particularly higher n
); most parameter values execute fine and output is produced, but once in a while there is no file produced. The 'post processing' then fails because of missing files.
What I know: Rerunning the function, different parameter values are effected, so this is not due to invalid parameter values, but seemingly a random failure. There have also been some runs without any problems. There was once an error message about failure to allocate vector of size xyz
.
What I tried: traceback()
seems to focus on the failure at the end of the sim (a symptom) but doesn't find the real cause. I also tried adding a while
loop conditional on the existence of the output file, what would rerun the parameter value if it failed (see below, commented out). This seemed to help a little, but not completely.
The above leads me to suspect some threads crash somehow, and then fail to output any of the parameters assigned to it.
Questions: What strategies would you use to diagnose this issue? What methods can one implement to make such a simulation more robust to errors (diagnosed or otherwise)? What kind of operations might I be doing what can cause such failures?
Sketch of the Sim. Loop:
library(foreach)
library(doMC)
Simulator <- function(params,...)
{
[... Pre Processing...]
times<-foreach(i=1:length(params)) %dopar%
{
# while(!file.exists(paste0("output",i,".rds"))) {
run <-list()
run$par <-params[[i]]
run$data <-list()
foreach(j=1:n) %do% # Run Sim n times with params
{
run$data[[j]] <- SimRun(params[[i]],...)
}
# Combine into single array and label dimensions
run$data <- abind(run$data,along=4)
dimnames(run$data)<- headers
# Compute statistics and save them
run$stats <- Stats(run$data,params[[i]])
saveRDS(run,paste0("output",i,".rds"))
# }
[...etc...]
}
[... Post Processing....]
}
Thanks for your patience!