Unexpected shift in convolution of PDFs

Question

I'm trying to convolve two PDFs, but the result is not making sense to me.

I'm convolving a triangular and a uniform distribution. None of them is centered in zero.

Here's the code:

import numpy as np
import plotly.express as px

# X values for bot
x_vec = np.linspace(0, 1, 100000, dtype='float32') # Distribution values

# Uniform distribution
p_x = np.where((x_vec >= 0.2) & (x_vec <= 0.5),
                        1/0.3, 0)
pdf_1 = p_x / np.sum(p_x) # Scaling to be a PDF

# Triangular distribution
p_x = np.where((x_vec >= 0.1) & (x_vec <= 0.3),
                        np.interp(x_vec, [0.1, 0.2, 0.3], [0, 10, 0]), 0)
pdf_2 = p_x / np.sum(p_x) # Scaling to be a PDF

# Convolution
conv_pdf = np.convolve(pdf_1, pdf_2, 'full')

# Plotting
fig = px.scatter()
fig.add_scatter(x=x_vec, y=pdf_1, name='pdf_sample_1')
fig.add_scatter(x=x_vec, y=pdf_2, name='pdf_sample_2')
fig.add_scatter(x=x_vec, y=conv_pdf, name='convolution')
fig.add_vline(x=0.35) # Added to show start of convolution != 0
fig.add_vline(x=0.85) # Added to show end of convolution != 0
fig.show()

I expect the convolution of a Uniform(0.2, 0.5) and a triangular(0.1, 0.2, 0.3) to be located in (0.3, 0.8). As shown in the image below, the convolution locates in (0.35, 0.85).

Can anyone explain to me what I'm doing wrong?

@Guimoute, Thanks for póinting that out!. `x_vec` refers to `x`. Will edit to leave the question consistent — ArthurOA, Aug 03 '23 at 18:09

chrslg · Answer 1 · 2023-08-03T18:21:33.713

It does locate in (0.3, 0.8)

import matplotlib.pyplot as plt
plt.plot(x,pdf_1)
plt.plot(x,pdf_2)
plt.plot(x,conv_pdf[:len(x)]) # Since you used 'full', shape of conv_pdf is len(pdf_1)+len(pdf_2)-1=199999. Corresponding to result in interval [0,2[. Which is expected
plt.axvline(0.3)
plt.axvline(0.8)

Only difference between this and your code, is my usage of matplotlib rather than plotly. And, probably more importantly, the truncation, since full convolution has a bigger shape than the input arrays. Index i being Σaₘbᵢ₋ₘ, so if you consider index 0 to be x=0, and index 99999 to be x=1, that is the estimation of ∫a(x)b(i/99999-x)dx, which is (a*b)(i/99999). So a*b in interval [0,1] is indeed conv_pdf[:100000]. The fact that conv_pdf has more value, simply match the fact that you also computed (a*b)(x) for all x in [0,2[.

That is why I am confident in the fact that plotting conv_pdf[:100000] is the right thing to do.

As for why you get another result, I don't really know. But I can surmise that is because plotly reacts differently from matplotlib when your x array and y array for a plot have different length. Matplotlib would just fail if you plt.plot(x,conv_pdf). While plotly apparently reinterpret, or interpolate, or something, the data. But the point is, it is quite normal that this line
fig.add_scatter(x=x_vec, y=conv_pdf, name='convolution')
has a strange behaviour, since x and y do not match in size

Edit

Out of curiosity, I've installed plotly, and tried your code: no shift. And no error raised. So half of what I said is true: plotly does react differently than matplotlib. Matplotlib would fail, because conv_pdf hasn't the same shape as x_vec. Plotly, indeed doesn't. But, at least with the version I have, it just truncates the plot, and plot only the part that makes sense (so it plots only 100000 first value of conv_pdf and drop the rest. Exactly as I did with my plt.plot(x, conv_pdf[:100000]).

So, my answer should have been rather a vote for "non reproducible or caused by a typo". Since, what you've here, is

either a bug in your plotly version (bug probably related to the different size, tho. I would have hard time to imagine that any plotly version could just shif a plain scatter(x,y) with x and y of the same size).
or a problem in your experiment (obviously, your code is not exactly what you posted, since you name the x coordinates x first, then x_vec; so I take that the code you posted is an attempt to summarize experiments you made in a notebook or an interactive interpreter. So, are you sure that conv_pdf is the correct one) ?

Either way, when I copy literaly your code in my interpreter, with the only correction of adding a x_vec=x alias, I get with plotly the exact same plot than my previous one (but with the style of yours)

Result of your code (edit in edit: in the meantime, while I was typing this, you corrected your code. So, to be extrasure, I just redid the figure with a verbatim copy&paste of your cod, and this is what I get)

Thanks! I plotted the code you provided and it worked. The problem I had was that I was using a different `x_vec = np.linspace(-0,05, 1, 100000, dtype='float32')` instead of starting from zero. Still puzzled about why that changes the result — ArthurOA, Aug 03 '23 at 18:43
There is no information in `y` (`pdf_1`, `pdf_2` or `conv_pdf`) about the `x` coordinate. The fact that you used `x_vec` to construct `pdf_1` and `pdf_2` is just utilitary (it is just a fancy way to say "first 20000 values are 0, then next 30000 values are 1/0.3, then next 50000 values are 0"). But `pdf_1` and `pdf_2` are just arrays of 100000 values, with no relation at all with interval [0,1]. — chrslg, Aug 03 '23 at 18:50
Likewise, `conv_pdf` has nothing to do with that interval [0,1] neither. The only argument you've passed to `convolve` are 2 100000 values arrays, so it returns a 199999 values arrays. How you interpret it, is up to you. — chrslg, Aug 03 '23 at 18:51
So those arrays contain just a bunch of values, indexed from 0 to 99999 (199999 for `conv_pdf`). The only way plotly (or matplotlib) can know how to interpret those 100000 values (where to draw them on the x axis, how to label them) is because you pass another array (`x_vec`) to tell it. To tell "you must interpret those 100000 values are samples from x=0 to x=1". So, of course, if you pass another `x` array, then it will use another interpretation. — chrslg, Aug 03 '23 at 18:53
If you had passed `np.linspace(5,12,100000)` as `x` array, meaning "those 100000 values I want to plot have x-coordinate from 5 to 12", then the curve would have been between 5 and 12 (with non 0 values between 5+(12-5)×30000/100000 and 5+(12-5)×80000/100000, that is between 7.1 and 10.6. So, you would have had a flat line between 5 and 7.1, then that flatten bell curve between 7.1 and 10.6, then a flat line between 10.6 and 12. @ArthurOA — chrslg, Aug 03 '23 at 18:56
Thanks! That was the issue. Thanks for the clarification regarding the x array and the ralationship with the convoltuion. — ArthurOA, Aug 04 '23 at 18:14

Unexpected shift in convolution of PDFs

1 Answers1

Edit