1

I want to create an array with unequally spaced values. The spacing should be determined by the superposition of (for example) two normal distributions with different mean and width values. For a single (normal) distribution I managed to get what I want with the help of this post: python, weighted linspace

Using this code:

import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

dist = stats.norm(loc=1.2, scale=0.6)
bounds = dist.cdf([0, 2])
pp = np.linspace(*bounds, num=21)
vals = dist.ppf(pp)

plt.plot(vals, [1]*vals.size, 'o')
plt.show()

I get the result I want for a single distribution:

enter image description here

However, I need exactly the same for a superposition of two normal distributions like:

dist1 = stats.norm(loc=3, scale=2)
dist2 = stats.norm(loc=1.2, scale=0.6)

This is how a histrogramm of the superimposed distributions looks like:

enter image description here

As a temporary solution I created the arrays for each distribution individually and added them together. However, this is not exactly what I want, because adding the the two individual arrays leads to fluctuating step sizes between the added arrays (for example it might happen that two values from the two different (individual) arrays are almost or exactly identical).

I also tried to define a new distribution that inherits from rv_continuous class from scipy.stats, but I failed to implement two different mean/width parameters.

I am pretty sure that it should work adding the individual probability density functions, but unfortunately I also failed with this approach.

Thanks in advance for any help and/or comment!

koxx
  • 317
  • 4
  • 15
  • When you say "added them together" did you mean adding the numpy arrays or overlaying them like so vals = np.array(list(dist1.ppf(pp)) + list(dist2.ppf(pp)))? This seems to be what you are looking for (handling identical entries can be added, if needed)? – GrimTrigger Dec 23 '20 at 13:15

1 Answers1

1

You could subclass rv_continuous and provide a pdf that is the mean of the two given pdfs.

import matplotlib.pyplot as plt
import numpy as np
from scipy import stats

class sum_gaussians_gen(stats.rv_continuous):
    def _pdf(self, x):
        return (stats.norm.pdf(x, loc=3, scale=2) + stats.norm.pdf(x, loc=1.2, scale=0.6)) / 2

dist = sum_gaussians_gen()
bounds = dist.cdf([0, 7])
pp = np.linspace(*bounds, num=21)
vals = dist.ppf(pp)

plt.plot(vals, [0.5] * vals.size, 'o')
xs = np.linspace(0, 7, 500)
plt.plot(xs, dist.pdf(xs))
plt.ylim(ymin=0)
plt.show()

resulting plot

JohanC
  • 71,591
  • 8
  • 33
  • 66
  • thanks for your answer btw. Helped me a lot! For a similar application I need to draw random numbers from such a distribution. Using ```dist.rvs(size=n)``` for this is unfortunately VERY slow. I created a new question for this [Speed up drawing random values from superimposed truncated normal distributions](https://stackoverflow.com/questions/65992160/speed-up-drawing-random-values-from-superimposed-truncated-normal-distributions) and thought you might be able to help here as well? Thanks again! – koxx Feb 02 '21 at 10:12