1

I am trying to fit Brillouin Spectra (with several peaks) using scipy.optimize.curve_fit. I have have multiple spectra with several peaks and I am trying to fit them with lorentzian functions (one Lorentzian per peak). I am trying to automate the process for bulk analysis (i.e., using the peak finding algorithm of scipy to get peak positions, peak widths and peaks heights and use them as initial guesses for the fit). I am now working on one spectrum to see if the general idea works, then I will extend it to be automatic and work with all the spectra I have. So far, I have done this:

import numpy as np
import matplotlib.pyplot as plt
from scipy.signal import find_peaks
from scipy.optimize import curve_fit

#import y data from linked google sheet 
y_data = np.loadtxt( 'y_peo.tsv' )
#define x data 
x_data = np.arange( len( y_data ) )

#find peak properties (peak position, amplitude, full width half maximum ) to use as 
#initial guesses for the curve_fit function 
pk, properties = find_peaks(y_data, height = 3, width = 3, prominence=0.1 ) #pk returns peaks position, properties returns 
#other properties associated with the peaks
I = properties ['peak_heights'] #amplitude
fwhm = (properties['widths']) #full width half maximum 

#define sum of lorentzians
def func(x, *params): 
    y = np.zeros_like(x)
    for i in range(0, len(params), 3):
        x_0 = params[i]
        I = params[i+1]
        y_0 = params[i+2]
        y = y + (I*y_0**2)/(y_0**2 +(x-x_0)**2) 
    return y

#initial guess list, to be populated with parameters found above (pk, I, fwhm)
guess = [] 

for i in range(len(pk)): 
    guess.append(pk[i])
    guess.append(I[i])
    guess.append(fwhm[i]) 

#convert list to array
guess=np.array(guess)

#fit
popt, pcov = curve_fit(func, x_data, y_data, p0=guess, method = 'lm',  maxfev=1000000)
print(popt)
fit = func(x_data, *popt)

#plotting
fig= plt.figure(figsize=(10,5))
ax= fig.add_subplot(1,1,1)
ax.plot(pk, y_data[pk], 'o', ms=5)
ax.plot(x_data, y_data, 'ok', ms=1)
ax.plot(x_data, fit , 'r--', lw=0.5)

Where y_peo is the dependent variable (here are the values as a google sheet file: https://docs.google.com/spreadsheets/d/1UB2lqs0Jmhthed7B0U9rznqcbpEMRmRX99c-iOBHLeE/edit?usp=sharing) and pixel is the independent variable (an arbitrary array of values created in the code). This is what I get: Result of spectrum fit. Any idea why the last triplet is not fitted as expected? I checked and all the peaks are found correctly by the scipy.signal.find_peaks function - hence I think the problem is with the scipy.optimize.curve_fit, since I had to increase the number of maxfev for the fit to 'work'. Any idea on how to tackle this problem in a smarter way?

Thanks in advance,

Giuseppe.

Giuseppe_C
  • 13
  • 7
  • did you have a look [here](https://stackoverflow.com/a/52608250/803359)? – mikuszefski May 08 '20 at 08:15
  • You might be interested in [peak-o-mat](http://qceha.net). It is a curve fitting software written in python/scipy that can be scripted. – Christian K. May 09 '20 at 14:23
  • thank you @ChristianK. I tried and download your peak-o-mat software using safari on a MacBook (and using the download MacOs link on the page, but get this error: **The requested URL /downloads/peak-o-mat-1.2.9_macosx-10.9-x86_64.zip was not found on this server.**) Any idea why? Thanks. – Giuseppe_C May 09 '20 at 16:31
  • @GiuseppeCiccone Thanks for letting me know. I fixed the link. Please try again. – Christian K. May 09 '20 at 22:02
  • @ChristianK. the link now works, however cannot decompress the zip file: this is the error: **Unable to expand 'peak-o-mat-1.2.9_macosx-10.11-x86_64.zip' into _specified directory_ (Error -1 - Undefined error:0.)**. Hope this helps. Thanks. – Giuseppe_C May 10 '20 at 10:02
  • @mikuszefski tried to run the code you suggested with my spectrum (same as shown in my question) and it takes a long time, and at every iteration I get this warning: RuntimeWarning: Number of calls to function has reached maxfev = 4600. warnings.warn(errors[info][0], RuntimeWarning) (for increasing number of maxfev each time...). Once it finishes running, this is the result I get: https://drive.google.com/file/d/1Wu5no1uuMHg0ynyxp_C3_H3rszFBANf_/view?usp=sharing - which is quite poor. I am not sure what's the best way to take this problem. Any hep would be appreciated :) – Giuseppe_C May 10 '20 at 14:08
  • @GiuseppeCiccone Please try again. The file was indeed corrupted. You might need to refresh the download page. – Christian K. May 10 '20 at 16:24
  • @GiuseppeCiccone Please improve your code so that we can run it ([minimal example](https://stackoverflow.com/help/minimal-reproducible-example)). Currently there are several undefined symbols. – Christian K. May 10 '20 at 18:36
  • Hi....the code that is linked, has some internal parameters that need to be modified. it the works quite OK. It is---as you mentioned--slow, though. However, you Lorentzian shapes are not so nice such that deviations in the large peaks are actually more pronounced then the smallest peak of the data. My code, hence, does not catch this without additional modifications. Instead of doing this... just let me modify your code to make it run....see below. – mikuszefski May 11 '20 at 08:18
  • @ChristianK. thank you for your help and suggestions. Please see code above (updated to reflect a minimal example). Also, I have managed to download the peak_o_mat software and will try and use it with my data. I will let you know how I get along with it. – Giuseppe_C May 11 '20 at 09:26
  • @ChristianK. I tried to use peak-o-mat for batch processing of my data; but I am not sure where to start given the small amount of documentation available. The software looks very cool and it is exactly what I need, if only I would be able to use it... Any help would be great! – Giuseppe_C May 12 '20 at 19:20
  • @GiuseppeCiccone Please write me by mail (on the qceha.net web page). – Christian K. May 12 '20 at 23:24

1 Answers1

1

With some small modifications the code of the OP runs just fine. Now it looks like:

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import leastsq
from scipy.signal import find_peaks

def lorentzian( x, x0, a, gam ):
    return a * gam**2 / ( gam**2 + ( x - x0 )**2 )

def multi_lorentz( x, params ):
    off = params[0]
    paramsRest = params[1:]
    assert not ( len( paramsRest ) % 3 )
    return off + sum( [ lorentzian( x, *paramsRest[ i : i+3 ] ) for i in range( 0, len( paramsRest ), 3 ) ] )

def res_multi_lorentz( params, xData, yData ):
    diff = [ multi_lorentz( x, params ) - y for x, y in zip( xData, yData ) ]
    return diff

y0 = np.loadtxt( 'y_peo.tsv' )
yData = np.loadtxt( 'y_peo.tsv' )
xData = np.arange( len( yData ) )

yGround = min( yData )
yData = yData - yGround
yAmp = max( yData )
yData = yData / yAmp 

#initial properties of peaks 
pk, properties = find_peaks( yData, height = .05, width = 3 )#, prominence=0.1 )
#extract peak heights and fwhm 
I = properties [ 'peak_heights' ]
fwhm = properties[ 'widths' ]

guess = [0]
for i in range( len( pk ) ): 
    guess.append( pk[i] )
    guess.append( I[i] )
    guess.append( fwhm[i] ) 

guess=np.array( guess )

popt, pcov = leastsq( res_multi_lorentz, x0=guess, args=( xData, yData ) )
print( popt )


testData = [ multi_lorentz( x, popt ) for x in xData ]
fitData = [ yGround + yAmp * multi_lorentz( x, popt ) for x in xData ]

fig= plt.figure( figsize=( 10, 5 ) )
ax= fig.add_subplot( 2, 1, 1 )
bx= fig.add_subplot( 2, 1, 2 )
ax.plot( pk, yData[pk], 'o', ms=5 )
ax.plot( xData, yData, 'ok', ms=1 )
ax.plot( xData, testData , 'r--', lw=0.5 )
bx.plot( xData, y0, ls='', marker='o', markersize=1 )
bx.plot( xData, fitData )
plt.show()

giving

[9.39456104e-03 6.55864388e+01 5.73522507e-02 5.46727721e+00
 1.21329586e+02 2.97187567e-01 2.12738107e+00 1.76823266e+02
 1.57244131e-01 4.55424037e+00 3.51556692e+02 3.08959955e-01
 4.39114581e+00 4.02954496e+02 9.02677035e-01 2.27647259e+00
 4.53994668e+02 3.74379310e-01 4.02432791e+00 6.15694190e+02
 4.51943494e-01 4.06522919e+00 6.63307635e+02 1.05793098e+00
 2.32168786e+00 7.10644233e+02 4.33194434e-01 4.17050014e+00
 8.61276198e+02 1.79240633e-01 4.26617114e+00 9.06211127e+02
 5.03070470e-01 2.10563379e+00 9.50973864e+02 1.78487912e-01
 4.01254815e+00]

and

enter image description here

mikuszefski
  • 3,943
  • 1
  • 25
  • 38
  • Thank you so much, this indeed works very well (and it is quite fast) with my data. Would you be able to explain what the code does exactly (e.g. provide some comments on code lines that are not so straightforward?) Thank you in advance for you invaluable help, greatly appreciated. – Giuseppe_C May 11 '20 at 09:32
  • @GiuseppeCiccone Well, I guess "not so straight forward" is very subjective. First of all---looking back at your code---I think the main problem is not to have a general y-offset. Probably the fit-code introduces a very broad peak to take account for your base signal. Apart from that I change from `curve_fit` to `least_sq` as here variable amount of input is sort of expected. On the other hand you have to write the functions of residuals yourself. On top, for simplification I scale and shift the data, which very often makes fitting easier. Feel free to ask specific questions. – mikuszefski May 11 '20 at 10:58
  • Thanks @ mikuszefski. I understand the code now. I ran it for batch processing of my spectra and it seems to work quite smoothly - except it misses some of the very small peaks - the outermost 2 usually (I tried to change the peak height in the _find_peaks_ function but this way the _leastsq_ reached _maxfev_, and did not complete the fit successfully). If I fit a smaller part of the spectrum (i.e. the 6 central peaks) then the code runs smoothly. Do you think using _find_peaks_ is a good way of estimating the initial parameters or would there be a more efficient way? – Giuseppe_C May 12 '20 at 19:18
  • @GiuseppeCiccone Hi, as you noticed, my original code is quite slow, compared to the `find_peak`. As I mentioned, I changed it to work with your data, but also noticed that the smallest peaks are not detected. In my case that is strongly due to the fact that your peaks deviate from the expected shape. My approach would probably combine two things. First go as now. Then remove the data in the vicinity of the peaks. Maybe smoothing and then look for remaining peaks. Can you post some data, where the algorithm fails? – mikuszefski May 13 '20 at 05:52
  • Note, there will always be a limit of detection. As the human brain is fantastic in pattern recognition, you will see it is there, but the algorithm will just not find it. you can help it with some additional information. Do you always expect them to come in a set of three. Is the gap between the thee the same or at least very similar. All this makes coding much harder but may allow for small peak detection. – mikuszefski May 13 '20 at 05:53
  • Finally, even when the position finder for the initial values detects the peak, the optimization algorithm might put is somewhere completely different, especially if the peak shape is not really correct. Is it a Voigt, maybe? – mikuszefski May 13 '20 at 05:56
  • Thanks @mikuszefski. Here is some data where the algorithm fails: https://drive.google.com/file/d/1CA8WFX5T1CLFXsr97qH5S8PP02YotsNw/view?usp=sharing (this is the resulting spectrum: https://drive.google.com/file/d/1GEScwOWGmOAFnjkKhluszb0LvukFjKUb/view?usp=sharing). Sometimes only one of the small peaks is not found; some others all 4 (as in this case). Let me know if you need more example data. – Giuseppe_C May 13 '20 at 09:57
  • yes, peaks always come as a set of 3. The gap between each central peak and the next central peak is constant. Does this help? Also, the shift between the central peaks and the two lateral peaks is the same for each spectrum, and should not vary significantly between spectra. Thank you. – Giuseppe_C May 13 '20 at 10:04
  • finally, the peak shape is a Lorentzian (as described by the Physics of Brillouin spectroscopy); although a damped harmonic oscillator (data analysis section of this paper https://www.nature.com/articles/lsa2017139.pdf) might describe the shape better. – Giuseppe_C May 13 '20 at 10:07
  • The peaks in the recent data seem very asymmetric....is there a lock-in time constant that does not go well with your scanning or so? – mikuszefski May 15 '20 at 07:52
  • @ mikuszefski, I have edited the code to include small peaks too, however it does not run as expected; Essentially, when the number of peaks is < 12, then I split the spectrum in smaller spectra (each containing a triplet of peaks) and I find their properties this way. Then, the code is the same as before. However, it does not run as expected... I can contact you by email if you are willing to take a look at my code. Thanks for your help. – Giuseppe_C May 15 '20 at 13:51
  • 1
    I have a similar approach tested and it works with the data you provided. Feel free to write me. However, I noticed that the small peaks are not fitted well. This is due to the fact that the peak shape is not Lorentzian, such that that a chi² algorithm uses the additional function to correct the large peaks rather then to fit the small ones. Any ideas what influences your peak shape to provide a better model? – mikuszefski May 18 '20 at 06:37