Smoothing a curve with many peaks with Gaussian

Question

I have spectroscopy data with some very sharp peaks as seen in blue curve. I would like to make the peaks a bit more smooth like the orange curve in the plot.

I thought the easiest way to do this is to convolve my data points with Gaussians. I know both numpy and scipy have convolve functions but I am not sure if I need 1D or 2D convolution to get what I need. So far I tried convolve1d and gaussian_filter1d from scipy and convolve from numpy. None of them improved the sharp lines connecting the data points. I also don't know how to choose the correct sigma or weights...

The text file containing the data points is here.

The orange curve is generated from a visualisation programme and I wish to be able to generate it myself with python rather than using the programme.

EDIT:

New link for file

are the "ramps" (i.e. from 1500-3100) part of the data or just artefacts from `plt.plot` connecting points? — Daniel F, Nov 09 '20 at 10:40
looking at your data I have serious doubts that the yellow graph is reflection what is really going on. Whatever the program is doing is highly dangerous for quantitative data analysis of your data at hand. — mikuszefski, Nov 10 '20 at 09:17
@mikuszefski I increased the intensities of the orange plot so that it is much clear (so that orange and blue don't overlap and they have clear lines). This is spectroscopy data so what we care about is where the peaks are. The program is what is usually used for this type of data analysis by many people so I think it should be fine... Why do you think it is dangerous? — beliz, Nov 10 '20 at 13:12
I think from some peaks you only see the falling edges. Estimating a peak from this is almost impossible. I have the impression that the program is just smoothing over this, resulting in rather meaningless peaks. The big problem is of course the lack of data in the gaps. Just out of curiosity, why is that? — mikuszefski, Nov 10 '20 at 15:43
@mikuszefski The falling edges you refer to are the data points I believe. I am performing a vibrational analysis of a moderate-sized system. The molecular simulation programme package I am using (CP2K) only prints the intensities of the vibrational modes of the system, therefore there are no intensity data for certain frequencies (i.e. x axis). I think before trying the convolution I need to fill those "empty" data with only zeros... I used Molden to get the convoluted graph. — beliz, Nov 10 '20 at 18:03
well, if the highest point is on the edge of a local data set...not sure, but I have no idea about that type of quantum chemistry simulations. What I can say is that I can manually reproduce your yellow graph. It is convoluted with Lorentzian though and not with a Gaussian. — mikuszefski, Nov 11 '20 at 09:21
@mikuszefski how did you manually produced the yellow? Do you mind sharing the code? — beliz, Nov 16 '20 at 10:59

score 0 · Answer 1 · answered Nov 09 '20 at 11:05

0

Looks like you want a "kernel density estimator" which is implemented by:

from scipy.stats import gaussian_kde

X = np.random.rand(50) * 3500
Y = np.random.rand(50) * 50
xi = linspace(0, 3500, 1000)

kde = gaussian_kde(X, weights = Y, bw_method = .01)  #tune `bw_method` to get the bandwidth you want
plt.plot(xi, kde.pdf(xi))

You may also need to adjust the y scaling of the graph to match your requirements

answered Nov 09 '20 at 11:05

Daniel F

13,620
2
29
55

So in your example `X` is the `x` coordinates of the blue curve, `xi` is the `x` coordinates of the orange curve and `kde.pdf(xi)` is the y coordinates of the orange curve. However I only have the X coordinates for the blue curve... So I don't understand how your example can help me unless you also show me what I should use for `xi`. – beliz Nov 09 '20 at 12:29
I can't access the website where your text file is located from my work computer, so you should probably include what data you have (at least an example) in the question itself. – Daniel F Nov 09 '20 at 12:34

score 0 · Answer 2 · answered Nov 16 '20 at 12:42

This is manually reproducing the orange curve given in the OP. Turns out it is convoluted with a Lorentzian not Gaussian.


import matplotlib.pyplot as plt
import numpy as np
from scipy.signal import find_peaks
from scipy.optimize import curve_fit

def gs( x, x0, a, s ):
    return a * np.exp( -( x - x0 )**2 / ( 2 * s**2 ) )

def cs( x, x0, a, s ):
    return a / ( ( x - x0 )**2 + s**2 )

conrange = 40000 

### gasiian is no good
# ~condata = np.fromiter( ( gs(x, 0, 1, 1800 ) for x in np.arange( -5000, 5000 ) ), np.float )
### Cauchy looks much better
condata = np.fromiter( 
    ( 
        cs( x, 0, 1, 2000 ) for x in np.arange( -conrange, conrange ) 
    ), np.float
)
### shift can be zero. 
### Amplitude does not matter as it will be scaled later anyway
### width matters of course, but is adjusted manually for the moment.

data = np.loadtxt("ir_data.txt")
xdata = data[:, 0]
ydata = data[:, 1]

xdataint = np.fromiter( ( int( x* 100 ) for x in xdata ), int ) 
xmin = xdataint[0]
xmax = xdataint[-1]
xfilled = np.arange( xmin , xdataint[-1] + 1 )
yfilled = np.zeros( len( xfilled ), dtype=np.float )
xfloat = np.fromiter( ( x / 100. for x in xfilled), float ) 


for x, y in zip( xdataint, ydata ):
    yfilled[ x - xmin ] = y
### just putting a manual scale here, but the real one can be calculated
### from the convolution properties
yc = 1e6 * np.convolve( condata, yfilled, mode="full" )

xfull = np.arange(
    -conrange + xmin, xmin + conrange + len( xfilled ) - 1
)
xfloat = np.fromiter( ( 0.01 * x for x in xfull ), float )

fig = plt.figure()
ax = fig.add_subplot( 1, 1, 1 )
ax.plot( xdata, ydata, ls='', marker='o', ms=2 )
ax.plot( xfloat, yc, ls='-')
plt.show()

Disclaimer

This are preliminary results and only posted due to request from the author of the OP. The might be some refinement.

Smoothing a curve with many peaks with Gaussian

2 Answers2

Linked