I am using a MCMC code (Monte-Carlo Markov-Chain) to test a model and see if it is validated.
I launch this MCMC with MPI for Python using 64 processors.
This MCMC is called automecee : https://johannesbuchner.github.io/autoemcee/index.html
Once convergence is reached, I have the following error that seems to come from mpi4py
:
I particular, the line of error is line 363 in automecee, here this part :
if converged:
# finally, gelman-rubin diagnostic on chains
chains = np.asarray([sampler.get_chain(flat=True) for sampler in self.samplers])
if self.use_mpi:
recv_chains = self.comm.gather(chains, root=0) <-- Line 363
chains = np.concatenate(self.comm.bcast(recv_chains, root=0))
assert chains.shape == (num_chains, num_steps * num_walkers, self.x_dim), (chains.shape, (num_chains, num_steps * num_walkers, self.x_dim))
rhat = arviz.rhat(arviz.convert_to_dataset(chains)).x.data
if self.log:
self.logger.info("rhat chain diagnostic: %s (<%.3f is good)", rhat, rhat_max)
converged = np.all(rhat < rhat_max)
if self.use_mpi:
converged = self.comm.bcast(converged, root=0)
I don't understand why the self.comm.gather(chains, root=0)
raises an error. Moreover, the SystemError indicates that a Negative size is passed to PyBytes_FromStringAndSize
: I don't understand what it is means.
If someone has had this kind of error with autoemceee and got to circumvent this issue.