-1

Suppose I have a continuous probability distribution, e.g., Normal, on a support A. Suppose that there is a Matlab code that allows me to draw random numbers from such a distribution, e.g., this.

I want to build a Matlab code to "approximate" this continuous probability distribution with a probability mass function spanning over r points.

This means that I want to write a Matlab code to:

(1) Select r points from A. Let us call these points a1,a2,...,ar. These points will constitute the new discretised support.

(2) Construct a probability mass function over a1,a2,...,ar. This probability mass function should "well" approximate the original continuous probability distribution.

Could you help by providing also an example? This is a similar question asked for Julia.


Here some of my thoughts. Suppose that the continuous probability distribution of interest is one-dimensional. One way to go could be:

(1) Draw 10^6 random numbers from the continuous probability distribution of interest and store them in a column vector D.

(2) Suppose that r=10. Compute the 10-th, 20-th,..., 90-th quantiles of D. Find the median point falling in each of the 10 bins obtained. Call these median points a1,...,ar.

How can I construct the probability mass function from here? Also, how can I generalise this procedure to more than one dimension?


Update using histcounts: I thought about using histcounts. Do you think it is a valid option? For many dimensions I can use this.

clear 

rng default

%(1) Draw P random numbers for standard normal distribution
P=10^6;
X = randn(P,1);

%(2) Apply histcounts
[N,edges] = histcounts(X); 

%(3) Construct the new discrete random variable

%(3.1) The support of the discrete random variable is the collection of the mean values of each bin 
supp=zeros(size(N,2),1);
for j=2:size(N,2)+1
    supp(j-1)=(edges(j)-edges(j-1))/2+edges(j-1);
end

%(3.2) The probability mass function of the discrete random variable is the
%number of X within each bin divided by P
pmass=N/P;

%(4) Check if the approximation is OK
%(4.1) Find the CDF of the discrete random variable 
CDF_discrete=zeros(size(N,2),1);
for h=2:size(N,2)+1
    CDF_discrete(h-1)=sum(X<=edges(h))/P;
end
%(4.2) Plot empirical CDF of the original random variable and CDF_discrete
ecdf(X)
hold on
scatter(supp, CDF_discrete)
TEX
  • 2,249
  • 20
  • 43
  • 2
    A probability mass function is a function that gives the probability that a _discrete_ random variable is exactly equal to some value [https://en.wikipedia.org/wiki/Probability_mass_function]. But perhaps what you're after is to model a _continuous_ probability distribution using a "discretised support" of (say) 10 points. The best way to do that would depend on how you intend to interpolate between these points, e.g. linear interpolation, cubic interpolation, or support-vector machines? Most interpolation methods generalize to higher dimensions using a grid of support vectors. – esskov Aug 10 '19 at 16:31
  • Thank you. Could you give an example of how any of the methods you suggest (e.g. linear interpolation) works for what I want? Thanks. – TEX Aug 10 '19 at 18:00
  • Also, at the end I want a probability mass function (summing up to 1). Thanks – TEX Aug 10 '19 at 18:13
  • See https://www.mathworks.com/help/matlab/interpolation.html – Argyll Aug 11 '19 at 07:16
  • Thanks to everyone. I'm not an expert and it would be helpful if you could add an example, e.g., suppose I want to model a standard normal distribution using a discretised support. – TEX Aug 11 '19 at 11:32
  • this should help:https://www.mathworks.com/help/stats/prob.normaldistribution.pdf.html `mu = 1; sigma = 3; D = normrnd(mu, sigma, [1000000,1]); y = pdf('Normal',D,mu,sigma); scatter(r,y)` – a11 Aug 11 '19 at 22:39
  • @AlexS1 It doesn't work: where the probability mass function? – TEX Aug 12 '19 at 07:30
  • @esskov What do you think about using `histcounts` (see my attempt added to the question). – TEX Aug 12 '19 at 08:33

1 Answers1

-1

I don't know if this is what you're after but maybe it can help you. You know, P(X = x) = 0 for any point in a continuous probability distribution, that is the pointwise probability of X mapping to x is infinitesimal small, and thus regarded as 0.

What you could do instead, in order to approximate it to a discrete probability space, is to define some points (x_1, x_2, ..., x_n), and let their discrete probabilities be the integral of some range of the PDF (from your continuous probability distribution), that is P(x_1) = P(X \in (-infty, x_1_end)), P(x_2) = P(X \in (x_1_end, x_2_end)), ..., P(x_n) = P(X \in (x_(n-1)_end, +infty))

:-)

Bjarke Kingo
  • 400
  • 7
  • 14