3

I've got an arbitrary probability density function discretized as a matrix in Matlab, that means that for every pair x,y the probability is stored in the matrix: A(x,y) = probability

This is a 100x100 matrix, and I would like to be able to generate random samples of two dimensions (x,y) out of this matrix and also, if possible, to be able to calculate the mean and other moments of the PDF. I want to do this because after resampling, I want to fit the samples to an approximated Gaussian Mixture Model.

I've been looking everywhere but I haven't found anything as specific as this. I hope you may be able to help me.

Thank you.

yms
  • 10,361
  • 3
  • 38
  • 68
Miguel
  • 33
  • 1
  • 3
  • I can't give you code. But if you don't find something in the documentation, you can implement this yourself. You only need to be able to sample from a discrete distribution. This [Wiki-Article](https://en.wikipedia.org/wiki/Pseudo-random_number_sampling#Finite_discrete_distributions) shows some methods and some are very easy to implement! If speed is not that important: pick linear search. If speed is important: pick the Alias-Method. – sascha Jul 28 '15 at 00:30
  • I don't think this question should be asked here. to calculate the mean and other moments from a arbitrary PDF is always hard, but if you can obtain the conditional probability: x|y and y|x, then you can use `gibbs sampling` to get what you want. you can find a example [here](http://timsalimans.com/the-power-of-jit-compilation/). – Gnimuc Jul 28 '15 at 04:49

2 Answers2

6

If you really have a discrete probably density function defined by A (as opposed to a continuous probability density function that is merely described by A), you can "cheat" by turning your 2D problem into a 1D problem.

%define the possible values for the (x,y) pair
row_vals = [1:size(A,1)]'*ones(1,size(A,2));  %all x values
col_vals = ones(size(A,1),1)*[1:size(A,2)];  %all y values

%convert your 2D problem into a 1D problem
A = A(:);
row_vals = row_vals(:);
col_vals = col_vals(:);

%calculate your fake 1D CDF, assumes sum(A(:))==1
CDF = cumsum(A); %remember, first term out of of cumsum is not zero

%because of the operation we're doing below (interp1 followed by ceil)
%we need the CDF to start at zero
CDF = [0; CDF(:)];

%generate random values
N_vals = 1000;  %give me 1000 values
rand_vals = rand(N_vals,1);  %spans zero to one

%look into CDF to see which index the rand val corresponds to
out_val = interp1(CDF,[0:1/(length(CDF)-1):1],rand_vals); %spans zero to one
ind = ceil(out_val*length(A));

%using the inds, you can lookup each pair of values
xy_values = [row_vals(ind) col_vals(ind)];

I hope that this helps!

Chip

chipaudette
  • 1,655
  • 10
  • 13
1

I don't believe matlab has built-in functionality for generating multivariate random variables with arbitrary distribution. As a matter of fact, the same is true for univariate random numbers. But while the latter can be easily generated based on the cumulative distribution function, the CDF does not exist for multivariate distributions, so generating such numbers is much more messy (the main problem is the fact that 2 or more variables have correlation). So this part of your question is far beyond the scope of this site.

Since half an answer is better than no answer, here's how you can compute the mean and higher moments numerically using matlab:

%generate some dummy input
xv=linspace(-50,50,101);
yv=linspace(-30,30,100);
[x y]=meshgrid(xv,yv);

%define a discretized two-hump Gaussian distribution
A=floor(15*exp(-((x-10).^2+y.^2)/100)+15*exp(-((x+25).^2+y.^2)/100));
A=A/sum(A(:)); %normalized to sum to 1

%plot it if you like
%figure;
%surf(x,y,A)

%actual half-answer starts here    

%get normalized pdf
weight=trapz(xv,trapz(yv,A));
A=A/weight; %A normalized to 1 according to trapz^2

%mean
mean_x=trapz(xv,trapz(yv,A.*x));
mean_y=trapz(xv,trapz(yv,A.*y));

So, the point is that you can perform a double integral on a rectangular mesh using two consecutive calls to trapz. This allows you to compute the integral of any quantity that has the same shape as your mesh, but a drawback is that vector components have to be computed independently. If you only wish to compute things which can be parametrized with x and y (which are naturally the same size as you mesh), then you can get along without having to do any additional thinking.

You could also define a function for the integration:

function res=trapz2(xv,yv,A,arg)

if ~isscalar(arg) && any(size(arg)~=size(A))
    error('Size of A and var must be the same!')
end

res=trapz(xv,trapz(yv,A.*arg));

end

This way you can compute stuff like

weight=trapz2(xv,yv,A,1);
mean_x=trapz2(xv,yv,A,x);

NOTE: the reason I used a 101x100 mesh in the example is that the double call to trapz should be performed in the proper order. If you interchange xv and yv in the calls, you get the wrong answer due to inconsistency with the definition of A, but this will not be evident if A is square. I suggest avoiding symmetric quantities during the development stage.

  • 1
    Good call on the 101x100 mesh instead of the 100x100 mesh. It can be really tricky (but *very* important) to get the dimensions and shape correct throughout one's code. Making the arrays not-square is a great approach to getting it right! – chipaudette Jul 28 '15 at 23:46