0

I'm trying to convolve two PDFs, but the result is not making sense to me.

I'm convolving a triangular and a uniform distribution. None of them is centered in zero.

Here's the code:

import numpy as np
import plotly.express as px

# X values for bot
x_vec = np.linspace(0, 1, 100000, dtype='float32') # Distribution values

# Uniform distribution
p_x = np.where((x_vec >= 0.2) & (x_vec <= 0.5),
                        1/0.3, 0)
pdf_1 = p_x / np.sum(p_x) # Scaling to be a PDF

# Triangular distribution
p_x = np.where((x_vec >= 0.1) & (x_vec <= 0.3),
                        np.interp(x_vec, [0.1, 0.2, 0.3], [0, 10, 0]), 0)
pdf_2 = p_x / np.sum(p_x) # Scaling to be a PDF

# Convolution
conv_pdf = np.convolve(pdf_1, pdf_2, 'full')

# Plotting
fig = px.scatter()
fig.add_scatter(x=x_vec, y=pdf_1, name='pdf_sample_1')
fig.add_scatter(x=x_vec, y=pdf_2, name='pdf_sample_2')
fig.add_scatter(x=x_vec, y=conv_pdf, name='convolution')
fig.add_vline(x=0.35) # Added to show start of convolution != 0
fig.add_vline(x=0.85) # Added to show end of convolution != 0
fig.show()

I expect the convolution of a Uniform(0.2, 0.5) and a triangular(0.1, 0.2, 0.3) to be located in (0.3, 0.8). As shown in the image below, the convolution locates in (0.35, 0.85).

enter image description here

Can anyone explain to me what I'm doing wrong?

ArthurOA
  • 23
  • 5

1 Answers1

2

It does locate in (0.3, 0.8)

import matplotlib.pyplot as plt
plt.plot(x,pdf_1)
plt.plot(x,pdf_2)
plt.plot(x,conv_pdf[:len(x)]) # Since you used 'full', shape of conv_pdf is len(pdf_1)+len(pdf_2)-1=199999. Corresponding to result in interval [0,2[. Which is expected
plt.axvline(0.3)
plt.axvline(0.8)

enter image description here

Only difference between this and your code, is my usage of matplotlib rather than plotly. And, probably more importantly, the truncation, since full convolution has a bigger shape than the input arrays. Index i being Σaₘbᵢ₋ₘ, so if you consider index 0 to be x=0, and index 99999 to be x=1, that is the estimation of ∫a(x)b(i/99999-x)dx, which is (a*b)(i/99999). So a*b in interval [0,1] is indeed conv_pdf[:100000]. The fact that conv_pdf has more value, simply match the fact that you also computed (a*b)(x) for all x in [0,2[.

That is why I am confident in the fact that plotting conv_pdf[:100000] is the right thing to do.

As for why you get another result, I don't really know. But I can surmise that is because plotly reacts differently from matplotlib when your x array and y array for a plot have different length. Matplotlib would just fail if you plt.plot(x,conv_pdf). While plotly apparently reinterpret, or interpolate, or something, the data. But the point is, it is quite normal that this line
fig.add_scatter(x=x_vec, y=conv_pdf, name='convolution')
has a strange behaviour, since x and y do not match in size

Edit

Out of curiosity, I've installed plotly, and tried your code: no shift. And no error raised. So half of what I said is true: plotly does react differently than matplotlib. Matplotlib would fail, because conv_pdf hasn't the same shape as x_vec. Plotly, indeed doesn't. But, at least with the version I have, it just truncates the plot, and plot only the part that makes sense (so it plots only 100000 first value of conv_pdf and drop the rest. Exactly as I did with my plt.plot(x, conv_pdf[:100000]).

So, my answer should have been rather a vote for "non reproducible or caused by a typo". Since, what you've here, is

  • either a bug in your plotly version (bug probably related to the different size, tho. I would have hard time to imagine that any plotly version could just shif a plain scatter(x,y) with x and y of the same size).
  • or a problem in your experiment (obviously, your code is not exactly what you posted, since you name the x coordinates x first, then x_vec; so I take that the code you posted is an attempt to summarize experiments you made in a notebook or an interactive interpreter. So, are you sure that conv_pdf is the correct one) ?

Either way, when I copy literaly your code in my interpreter, with the only correction of adding a x_vec=x alias, I get with plotly the exact same plot than my previous one (but with the style of yours)

Result of your code (edit in edit: in the meantime, while I was typing this, you corrected your code. So, to be extrasure, I just redid the figure with a verbatim copy&paste of your cod, and this is what I get)

enter image description here

chrslg
  • 9,023
  • 5
  • 17
  • 31
  • Thanks! I plotted the code you provided and it worked. The problem I had was that I was using a different `x_vec = np.linspace(-0,05, 1, 100000, dtype='float32')` instead of starting from zero. Still puzzled about why that changes the result – ArthurOA Aug 03 '23 at 18:43
  • There is no information in `y` (`pdf_1`, `pdf_2` or `conv_pdf`) about the `x` coordinate. The fact that you used `x_vec` to construct `pdf_1` and `pdf_2` is just utilitary (it is just a fancy way to say "first 20000 values are 0, then next 30000 values are 1/0.3, then next 50000 values are 0"). But `pdf_1` and `pdf_2` are just arrays of 100000 values, with no relation at all with interval [0,1]. – chrslg Aug 03 '23 at 18:50
  • 1
    Likewise, `conv_pdf` has nothing to do with that interval [0,1] neither. The only argument you've passed to `convolve` are 2 100000 values arrays, so it returns a 199999 values arrays. How you interpret it, is up to you. – chrslg Aug 03 '23 at 18:51
  • So those arrays contain just a bunch of values, indexed from 0 to 99999 (199999 for `conv_pdf`). The only way plotly (or matplotlib) can know how to interpret those 100000 values (where to draw them on the x axis, how to label them) is because you pass another array (`x_vec`) to tell it. To tell "you must interpret those 100000 values are samples from x=0 to x=1". So, of course, if you pass another `x` array, then it will use another interpretation. – chrslg Aug 03 '23 at 18:53
  • 1
    If you had passed `np.linspace(5,12,100000)` as `x` array, meaning "those 100000 values I want to plot have x-coordinate from 5 to 12", then the curve would have been between 5 and 12 (with non 0 values between 5+(12-5)×30000/100000 and 5+(12-5)×80000/100000, that is between 7.1 and 10.6. So, you would have had a flat line between 5 and 7.1, then that flatten bell curve between 7.1 and 10.6, then a flat line between 10.6 and 12. @ArthurOA – chrslg Aug 03 '23 at 18:56
  • Thanks! That was the issue. Thanks for the clarification regarding the x array and the ralationship with the convoltuion. – ArthurOA Aug 04 '23 at 18:14