1

I want to generate a normal (probability) distribution using numpy.random.normal(), where the sum of the probabilities in a row (over all columns) must be 1.

I used the following code to generate a sample 4 by 3 probability matrix:

mu, sigma = 0.5, 0.20  # mean and standard deviation
np.random.seed(40)  
sample_probability = np.random.normal(mu, sigma, size=(4,3))

but the sum of the probabilities in each row becomes larger than 1, which I don't want to have.

 [[0.37849046 0.47477272 0.36307873]
 [0.68574295 0.13111979 0.40659952]
 [0.95849807 0.59776201 0.6420534 ]
 [0.71110689 0.51081462 0.55159068]]

i.e. np.sum(sample_probability[0,:]) yields 1.216341905895543, but I want to make it 1.

Would you please share your insights how I can customize numpy.random.normal() so that it limits the distribution of probabilities in a row into 1?

Thanks.

[UPDATE] I went for manually normalizing each row, rather introducing the modifications in numpy.random.normal(). Thanks to Mikhail and Frank.

Md Morshed Alam
  • 71
  • 1
  • 1
  • 10

4 Answers4

1

First of all, make sure you understand that introducing this modification will change the joint distribution so that your variables will no longer be distributed as i.d.d. Gaussians with a given mean and std.

A simple way to do it would be to manually normalize each row by its sum of the entries (after sampling):

sample_probability/=np.sum(sample_probability,axis=1)[:,np.newaxis]
Mikhail Genkin
  • 3,247
  • 4
  • 27
  • 47
1

You have to be more clear about what you want. What sort of standard distribution do you want at the end? You'll have to change either mu, sigma or both to convert the distribution you have into one whose elements sum into 1.

If you just want to divide each row by its sum:

row_sums = np.sum(sample_probability, axis=1)
result = sample_probability / row_sums[:,None]

Alternatively, you could look at each row, see how the total of that row differs from 1, divide that delta by the number of items in that row and add delta/n to each element. This is also a standard distribution.

Frank Yellin
  • 9,127
  • 1
  • 12
  • 22
  • Thanks for your reply. I got your point. It's not a good idea to change the distribution. I should rather go for the normalization. – Md Morshed Alam Oct 27 '20 at 17:45
1

random.normal does not generate probabilities. It generates random numbers with a particular normal distribution. For a large number of the those values, the mean will be close to the specified mu. In your case the row sum will approximate 3*mu, 1.5.

In [1]: mu, sigma = 0.5, 0.20  # mean and standard deviation
   ...: np.random.seed(40)
  
In [2]: x = np.random.normal(mu, sigma, size=(4,3))
In [3]: x.mean()
Out[3]: 0.5101817035650666
In [4]: x.mean(axis=1)
Out[4]: array([0.53043459, 0.494771  , 0.4213001 , 0.59422113])
In [5]: x.sum(axis=1)
Out[5]: array([1.59130377, 1.48431299, 1.26390029, 1.78266339])

For a larger dimension:

In [6]: x = np.random.normal(mu, sigma, size=(4,1000))
In [7]: x.mean(axis=1)
Out[7]: array([0.50881455, 0.50950833, 0.49800201, 0.49708817])
In [8]: x.sum(axis=1)
Out[8]: array([508.8145494 , 509.50833417, 498.00200654, 497.08816538])

We could scale the values so the row sum is 1, but the row mean will no longer be mu:

In [19]: x1 = x/x.sum(axis=1, keepdims=True)
In [20]: x1.sum(axis=1)
Out[20]: array([1., 1., 1., 1.])
In [21]: x1.mean(axis=1)
Out[21]: array([0.001, 0.001, 0.001, 0.001])
hpaulj
  • 221,503
  • 14
  • 230
  • 353
1

It would probably make more sense to use a Dirichlet distribution for this. Whereas a normal distribution can theoretically generate any number (some with very low probability), a Dirichlet distribution by definition generates sets of n numbers that add up to one.

If, as you say, you are looking for a matrix of probabilities, well, that's exactly what a Dirichlet distribution is for! It's a probabilistic way to generate probabilities. (Probabilities for a Multinomial distribution, to be precise.)

Here's a simple usage example:

import numpy

prob_mat = numpy.random.dirichlet([5, 5, 5, 5], 4)
print(prob_mat)

Output:

[[ 0.22564822  0.31584644  0.22485089  0.23365445]
 [ 0.16188422  0.3077273   0.35070738  0.1796811 ]
 [ 0.33209931  0.32359204  0.11584078  0.22846787]
 [ 0.21951849  0.02267694  0.50503356  0.25277101]]

Here the numbers will always have the same mean. If you want to give more weight to some than others, pass larger or smaller numbers in the first argument. The number of elements in the first argument determines the size of the rows.

prob_mat = numpy.random.dirichlet([1, 9], 4)
print(prob_mat)

Output:

[[ 0.09191857  0.90808143]
 [ 0.05854907  0.94145093]
 [ 0.12310873  0.87689127]
 [ 0.10848055  0.89151945]]
senderle
  • 145,869
  • 36
  • 209
  • 233
  • Thank you for your input. I have a question for you by the way. I am actually working with Hidden Markov Model, and I am assuming that the state space (for hidden variables) and the observation space are actually discrete. And that's why, I intend to use Normal distribution. I don't know very much about Dirichlet Distribution. Can you share any insights of using Dirichlet distribution for HMM? – Md Morshed Alam Oct 27 '20 at 19:52
  • @MdMorshedAlam, I don't know a lot about HMMs. However, I can make some guesses. My understanding is that an HMM involves two sets of probability tables, both of which are basically sets of Multinomial distributions (i.e. weighted die throws). In stats parlance, the Dirichlet distribution is the "conjugate prior" of the Multinomial distribution, which means that in some sense it is the natural way to create a distribution over possible Multinomial distributions. – senderle Oct 27 '20 at 20:09
  • My knowledge of Dirichlet distributions comes from LDA topic modeling, where they are used to model a two-step generative process, in which each document in a set of documents is associated with a Multinomial distribution over topics, and each topic is associated with a Multinomial distribution over words. The mathematical relationship between the Dirichlet distribution and the Multinomial distribution means that you can essentially create a feedback loop between them. – senderle Oct 27 '20 at 20:12
  • You draw a Multinomial distribution from a Dirichlet distribution, then sample counts from the Multinomial distribution, and then feed those counts back into the alpha parameter for the Dirichlet distribution. In LDA, that process eventually converges on a stable set of document-topic and topic-word Multinomial distributions. I don't know how to adapt that to HMMs but I bet it's not too different! [This article](https://www.ccs.neu.edu/home/vip/teach/DMcourse/5_topicmodel_summ/notes_slides/sampling/darling-lda.pdf) has a detailed description of the practical details for LDA. – senderle Oct 27 '20 at 20:17
  • Interesting! Thanks for sharing your insights. I will dig into it. – Md Morshed Alam Oct 27 '20 at 20:19