1

I'm trying to get the effective sample size for a 2D mcmc chain, using pymc3 and arviz

import pymc3 as pm3
!pip install arviz 
import arviz as az

ess = az.ess(samples)

The above code works for 1D, but not for 2D, and I see there is a az.convert_to_dataset that might help, but I can't figure out how to use it?

Samples would be an N x 2 array and it should just give a single number as the output

Thanks!

  • "It should give a single number": are you sure? The ESS is parameter-specific. The ESS can be (wildly) different for different model parameters in the same chain of MCMC samples. – Limey Jul 17 '20 at 15:51
  • Ah okay, so I want a single number per parameter and I can do the working 1D ess on the relevant part of the sample array - ace, thanks! – Lizardinablizzard Jul 20 '20 at 10:59
  • @Lizardinablizzard assuming you have `N` draws from 2 parameters you can prepend a dimension with `[None, ...]` or with `np.expand_dims` to comply with the convention I explained below – OriolAbril Jul 21 '20 at 17:49
  • Also, tangentially related question, are you using PyMC3 to generate the samples? If so, have you tried `arviz.from_pymc3`? – OriolAbril Jul 21 '20 at 17:49
  • 1
    I've written my own mcmc sampler, and then I get the accepted 2D sample points as an output of size N x 2, so maybe I need to expand dims? – Lizardinablizzard Jul 22 '20 at 16:19
  • I've written my own mcmc sampler, and then I get the accepted 2D sample points as an output of size N x 2, so maybe I need to expand dims? – Lizardinablizzard Jul 22 '20 at 16:19
  • I have updated the answer with the info from this comment, I hope it starts making things clear :) – OriolAbril Jul 22 '20 at 16:30

1 Answers1

1

When working with arrays, ArviZ assumes the following shape convention:

  • 1d array represents the draws of a single chain of a scalar variable: (draw,)
  • 2d array represents the draws of multiple chains of a scalar variable: (chain, draw)
  • 3d+ array represents the draws of multiple chains of multidimensional variables: (chain, draw, *shape)

I am not sure why the 2d case is not working for you, I suspect it could be due to not having enough draws to calculate ess.

To make sure that your dimensions are being correctly interpreted, I would recommend doing idata = az.convert_to_inference_data(ary) and then checking idata.posterior to see the dimensions of the generated object. You can then call az.ess(idata) to get the effective sample size.


EDIT: If I understood your comments correctly, you are generating an array with shape (draw=N, parameter_dim=2) as you are only sampling a single chain. As this is a 2d array, it would be interpreted as having N chains and 2 draws which should print a warning of having more chains than draws. You can reshape the array to match ArviZ convention with:

idata = az.convert_to_inference_data(np.expand_dims(samples, 0))
# or what is the same (we just choose the name of the variable)
idata = az.from_dict({"position": np.expand_dims(samples, 0)})

which will generate a (1, N, 2) array whose dimensions will be understood by ArviZ. I have already added the conversion to InferenceData too as having an InferenceData will allow you to call any ArviZ function without having to care about dimensions any more.


If your array were (2, N), adding a transpose before expanding the axis should solve the problem:

idata = az.convert_to_inference_data(np.expand_dims(samples.T, 0))
OriolAbril
  • 7,315
  • 4
  • 29
  • 40
  • Thanks @oriolabril, so I think I needed to take the transpose of my samples to get it into (chain, draw) format, otherwise I get NaN. But now I get an answer of 2.61, when I have ran an MCMC chain with reasonable values of 10000 samples, so I expect something more like 4000? I can see when I plot it, there has been reasonable exploration of the parameter space. When I print idata.posterior, it says chain is 0, 1, draw is 0,....,9999, and x is the sample values? – Lizardinablizzard Jul 22 '20 at 15:43
  • are you sampling two chains? Or are you sampling 1 chain with 2 parameters? It is hard to guess what is happening without a minimal example, maybe explaining how did you generate the samples could help, ArviZ integrates with most Python PPL libraries (see https://arviz-devs.github.io/arviz/notebooks/InferenceDataCookbook.html) so converting the data to a format understandable by ArviZ is a key part of the process – OriolAbril Jul 22 '20 at 16:17
  • If you were in the 1 chain and 2 parameters situation but loading into ArviZ generated 2 chains and 1 parameter, it is not unexpected to get 2.6 as ESS, you'd be considering the 2 parameters as the same object, whose distribution would most probably be multimodal. ArviZ implements by default the algorithm from https://arxiv.org/pdf/1903.08008.pdf which as said in the bottom of page 4: _"For multimodal distributions with well-separated modes, the split-̂R adjustment leads to an ESS estimate that is close to the number of distinct modes that are found"_ – OriolAbril Jul 22 '20 at 16:20
  • I think it's 1 chain of 2 parameters. In my mcmc algorithm, I start at a 2D coordinate and then walk around, if the move is accepted, the coordinates are stored in the N x 2 chain, until N moves are accepted. I'm trying something like ```az.ess(az.convert_to_inference_data(np.transpose(samples)))``` but now I see why it would be close to 2, although I'm not sure how to fix it, I'll check out the cookbook, thanks! – Lizardinablizzard Jul 22 '20 at 16:30
  • Great, thanks @oriolabril! I now get two different numbers out ~400, but I think I'm actually aiming for one number, because the dimensions are correlated? I don't think it would be right to just add them. I guess I take some euclidean distance in 2D space between the sample points to make the sample array actually 1D, or are there better methods? Edit: ah, I think I need to read the arxiv paper, thanks! – Lizardinablizzard Jul 24 '20 at 00:40
  • You are actually aiming for 2 values, each parameter will have its ess, independently of how correlated they are. Moreover, even with correlated parameters, the two ess can be arbitrariy different – OriolAbril Jul 24 '20 at 08:14