0

Apologies, since this question is somewhat vague and general, and is certainly not reproducible since the code is too complex. However, I suspect it could be answered by equally vague strategies of approaching these issues that are instructive and helpful.

I have coded a simulator which has a main, parallelized loop iterating through parameter values, loading them to the model and running them n times.

The issue: while the code generally works well for smaller problem dimensions, it fails at a significant frequency at higher dimensions (particularly higher n); most parameter values execute fine and output is produced, but once in a while there is no file produced. The 'post processing' then fails because of missing files.

What I know: Rerunning the function, different parameter values are effected, so this is not due to invalid parameter values, but seemingly a random failure. There have also been some runs without any problems. There was once an error message about failure to allocate vector of size xyz.

What I tried: traceback() seems to focus on the failure at the end of the sim (a symptom) but doesn't find the real cause. I also tried adding a while loop conditional on the existence of the output file, what would rerun the parameter value if it failed (see below, commented out). This seemed to help a little, but not completely.

The above leads me to suspect some threads crash somehow, and then fail to output any of the parameters assigned to it.

Questions: What strategies would you use to diagnose this issue? What methods can one implement to make such a simulation more robust to errors (diagnosed or otherwise)? What kind of operations might I be doing what can cause such failures?

Sketch of the Sim. Loop:

library(foreach)
library(doMC)

Simulator <- function(params,...)
{
    [... Pre Processing...]

    times<-foreach(i=1:length(params)) %dopar%
    {
    # while(!file.exists(paste0("output",i,".rds"))) {
        run <-list()
        run$par <-params[[i]]
        run$data <-list()

        foreach(j=1:n) %do% # Run Sim n times with params
        {
            run$data[[j]] <- SimRun(params[[i]],...)
        }

        # Combine into single array and label dimensions
        run$data <- abind(run$data,along=4)
        dimnames(run$data)<- headers

        # Compute statistics and save them
        run$stats <- Stats(run$data,params[[i]])
        saveRDS(run,paste0("output",i,".rds"))

    # }

        [...etc...]
    }
    [... Post Processing....]
}

Thanks for your patience!

Ixxie
  • 1,393
  • 1
  • 9
  • 17
  • 2
    Is there anything stochastic in your code? If that's the case, you should use package doRNG to make it reproducible, which would allow you to reproduce the specific iteration where the code fails. – Roland Aug 21 '14 at 09:26
  • Instead of doRNG, I seeded each thread identically to make it reproducible. The simulation's failure is still irreproducible, as I suspected; the error doesn't seem to be due to particular parameter values but occurs seemingly at random. I get the feeling this may be caused by some kind of memory bottleneck. – Ixxie Aug 25 '14 at 08:24

0 Answers0