17

I have an energy spectrum from a cosmic ray detector. The spectrum follows an exponential curve but it will have broad (and maybe very slight) lumps in it. The data, obviously, contains an element of noise.

I'm trying to smooth out the data and then plot its gradient. So far I've been using the scipy sline function to smooth it and then the np.gradient().

As you can see from the picture, the gradient function's method is to find the differences between each point, and it doesn't show the lumps very clearly.

I basically need a smooth gradient graph. Any help would be amazing!

I've tried 2 spline methods:

def smooth_data(y,x,factor):
    print "smoothing data by interpolation..."
    xnew=np.linspace(min(x),max(x),factor*len(x))
    smoothy=spline(x,y,xnew)
    return smoothy,xnew

def smooth2_data(y,x,factor):
    xnew=np.linspace(min(x),max(x),factor*len(x))
    f=interpolate.UnivariateSpline(x,y)
    g=interpolate.interp1d(x,y)
    return g(xnew),xnew

edit: Tried numerical differentiation:

def smooth_data(y,x,factor):
    print "smoothing data by interpolation..."
    xnew=np.linspace(min(x),max(x),factor*len(x))
    smoothy=spline(x,y,xnew)
    return smoothy,xnew

def minim(u,f,k):
    """"functional to be minimised to find optimum u. f is original, u is approx"""
    integral1=abs(np.gradient(u))
    part1=simps(integral1)
    part2=simps(u)
    integral2=abs(part2-f)**2.
    part3=simps(integral2)
    F=k*part1+part3
    return F


def fit(data_x,data_y,denoising,smooth_fac):
    smy,xnew=smooth_data(data_y,data_x,smooth_fac)
    y0,xnnew=smooth_data(smy,xnew,1./smooth_fac)
    y0=list(y0)
    data_y=list(data_y)
    data_fit=fmin(minim, y0, args=(data_y,denoising), maxiter=1000, maxfun=1000)
    return data_fit

However, it just returns the same graph again!

Data, smoothed data and gradients

Lucidnonsense
  • 1,195
  • 3
  • 13
  • 35

3 Answers3

17

There is an interesting method published on this: Numerical Differentiation of Noisy Data. It should give you a nice solution to your problem. More details are given in another, accompanying paper. The author also gives Matlab code that implements it; an alternative implementation in Python is also available.

If you want to pursue the interpolation with splines method, I would suggest to adjust the smoothing factor s of scipy.interpolate.UnivariateSpline().

Another solution would be to smooth your function through convolution (say with a Gaussian).

The paper I linked to claims to prevent some of the artifacts that come up with the convolution approach (the spline approach might suffer from similar difficulties).

Eric O. Lebigot
  • 91,433
  • 48
  • 218
  • 260
  • I've tried the numerical differentiation method: see the new attachment – Lucidnonsense Apr 07 '13 at 19:29
  • Line `part2 = simps(u)` is incorrect: `part2` should instead be an array that contains the integral of u from 0 *to each abscissa*. Then you should try an *exponentially varying* smoothing coefficient in order to find the one that suits your needs best. If you really calculate the derivative by taking into account the step size in x, then I expect a good smoothing factor to be about 1e6, for a derivative that lies mostly between -1 and +1—while I may be wrong, you may want to try this value anyway, just in case my back-of-the-envelope calculation is correct. – Eric O. Lebigot Apr 08 '13 at 04:50
  • PS: There is also *no need to smooth your data beforehand*, with the Numerical Differentiation of Noisy Data method. It looks like an interesting method. I am looking forward to your results! Good luck… – Eric O. Lebigot Apr 08 '13 at 11:50
  • PPS: A good initial guess for the solution `u` might be the (noisy) derivative (instead of the function itself). In fact, the procedure described in the paper finds a derivative *without smoothing the function*. – Eric O. Lebigot Apr 09 '13 at 09:51
  • I'm not sure I understand your version of part2. I would implement it as a for loop calculating each integral. But then how would this new array work with the rest of the fitting routine? Also, I smoothed the data to provide the initial guess, would this be fundamentally wrong? I can't see how the noisy derivative would provide a good guess – Lucidnonsense Apr 09 '13 at 14:26
  • You can indeed use a loops, for the integrals `part2` (or use the trapezoidal or rectangle rule and calculate is faster with `cumsum()`). According to Eq. (2) in the paper I linked to, you simply use it as you did, integrating `abs(part2-f)` (difference of two arrays of the same size). As for the initial guess, the algorithm calculates the *derivative* `u` of your data, *not* a smoothed version of your data. – Eric O. Lebigot Apr 10 '13 at 04:16
  • PS: I added a reference to a more detailed version of the algorithm, and to a Matlab code that implements it. The Python scientific community would likely benefit from a Python implementation made available through the Python Package Index! – Eric O. Lebigot Apr 10 '13 at 04:17
  • Thank you for all your help but I'm very busy at the moment and will be unlikely to implement the algorithm. However, when my current project is complete, I will attempt to finish this and post the results. Thanks again! – Lucidnonsense Apr 13 '13 at 16:38
  • 2
    Could someone advise me how to deal with the similar problem but non-evenly distributed data? I have my measurement sets for X and Y. Would the algorithm described in the paper be suitable? I try to go through the python implementation, but don't find the way yet to introduce the X measurements into play. – Spu Oct 20 '15 at 08:33
  • 1
    @Spu, it only works for evenly distributed data. The function contained in the python implementation posted here by @EOL takes as an argument the grid spacing `dx` which is a scalar – David Aug 08 '17 at 14:31
  • @Spu Note that the algorithm itself can accommodate unevenly spaced X values. It's only the implementation which is limited. It should not be hard to generalize it. – Eric O. Lebigot Aug 09 '17 at 07:45
10

I won't vouch for the mathematical validity of this; it looks like the paper from LANL that EOL cited would be worth looking into. Anyway, I’ve gotten decent results using SciPy’s splines’ built-in differentiation when using splev.

%matplotlib inline
from matplotlib import pyplot as plt
import numpy as np
from scipy.interpolate import splrep, splev

x = np.arange(0,2,0.008)
data = np.polynomial.polynomial.polyval(x,[0,2,1,-2,-3,2.6,-0.4])
noise = np.random.normal(0,0.1,250)
noisy_data = data + noise

f = splrep(x,noisy_data,k=5,s=3)
#plt.plot(x, data, label="raw data")
#plt.plot(x, noise, label="noise")
plt.plot(x, noisy_data, label="noisy data")
plt.plot(x, splev(x,f), label="fitted")
plt.plot(x, splev(x,f,der=1)/10, label="1st derivative")
#plt.plot(x, splev(x,f,der=2)/100, label="2nd derivative")
plt.hlines(0,0,2)
plt.legend(loc=0)
plt.show()

matplotlib output

billyjmc
  • 403
  • 5
  • 10
  • Is it possible to use this method for non-evenly distributed data? Could I add my sets of measurements both for X and Y? – Spu Oct 20 '15 at 08:27
  • The documentation for the function used here (`scipy.interpolate.splrep()`) does not mention any restriction for non-evenly distributed data. Beyond just looking at the documentation, you can also certainly try by yourself by changing the value of `x` in the code. More generally, it is appreciated that you make some visible efforts at answering your own question, on Stack Overflow, so as to save others some time (and make them more likely to take the time to answer your question). – Eric O. Lebigot Oct 20 '15 at 19:48
  • 1
    @Spu, yes! I used `splrep` just two days ago to perform a cubic b-splines interpolatation of sample data acquired at non-uniform intervals so I could perform an FFT. – billyjmc Oct 21 '15 at 15:42
5

You can also use scipy.signal.savgol_filter.

Result

enter image description here

Example

import matplotlib.pyplot as plt
import numpy as np
import scipy
from random import random

# generate data
x = np.array(range(100))/10
y = np.sin(x) + np.array([random()*0.25 for _ in x])
dydx = scipy.signal.savgol_filter(y, window_length=11, polyorder=2, deriv=1)

# Plot result
plt.plot(x, y, label='Original signal')
plt.plot(x, dydx*10, label='1st Derivative')
plt.plot(x, np.cos(x), label='Expected 1st Derivative')
plt.legend()
plt.show()
Roald
  • 2,459
  • 16
  • 43
  • Updated docs link: https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.savgol_filter.html – zabop Jan 15 '22 at 16:50