2

I have a data set described by an integral with unknown constants that I am attempting to determine using python's curve_fit. However, the integrand contains a function being multiplied against a data set

def integrand(tm, Pm, args):
    dt, alpha1, alpha2 = args
    return Pm*(1-np.e**( -(alpha1 * (tm+dt))) )*np.e**(-(alpha2 * (tm+dt)))

Where Pm is a 1-D array of collected data of impulse responses, Image of Impulse data and Integral Curve

The orange curve represents the impulse data and the blue curve is what the integral should evaluate to

tm is the variable of integration, and dt, alpha1, alpha2 are unknown constants and the bounds of integration would be from 0 to tm.

What would be the best way to perform a curve fit on an integral of this kind, or possibly other ways to solve for the unknown constants?

The Data sets are linked here

J.zendejas
  • 21
  • 2

1 Answers1

0

From the length of the data sets, it seems that the intent is to fit integrand(t) to output(t+dt). There are several functions in the scipy optimize module that can be used to do this. For a simple example, we show an implementation using scipy.optimize.leastsqr(). For further details see the tutorial at scipy optimize

The basic scheme is to create a function that evaluates the model function over the independent coordinate and returns a numpy array containing the residuals, the difference between the model and the observations at each point. The leastsq() finds the values of a set of parameters that minimizes the sum of the squares of the residuals.

We note as a caveat that the fit can be sensitive to the initial guess. Simulated annealing is often used to find a likely global minimum and provide a rough estimate for the fit parameters before refining the fit. The values used here for the initial guess are for notional purposes only.

from scipy.optimize import leastsq
import numpy as np

# Read the data files    
Pm = np.array( [ float(v) for v in open( "impulse_data.txt", "r" ).readlines() ] )
print type(Pm), Pm.shape

tm = np.array( [ float(v) for v in open( "Impulse_time_axis.txt", "r" ).readlines() ] )
print type(tm), tm.shape

output = np.array( [ float(v) for v in open( "Output_data.txt", "r" ).readlines() ] )
print type(output), output.shape

tout = np.array( [ float(v) for v in open( "Output_time_axis.txt", "r" ).readlines() ] )
print type(tout), tout.shape

# Define the function that calculates the residuals
def residuals(  coeffs, output, tm ):
    dt, alpha1, alpha2 = coeffs
    res = np.zeros( tm.shape )
    for n,t in enumerate(tm):
        # integrate to "t"
        value = sum( Pm[:n]*(1-np.e**( -(alpha1 * (tm[:n]+dt))) )*np.e**(-(alpha2 * (tm[:n]+dt))) )
        # retrieve output at t+dt
        n1 = (np.abs(tout - (t+dt) )).argmin()
        # construct the residual
        res[n] =  output[n1] - value
    return res

# Initial guess for the parameters
x0 = (10.,1.,1.)

# Run the least squares routine
coeffs, flag = leastsq( residuals, x0, args=(output,tm) )

# Report the results
print( coeffs )
print( flag )
DrM
  • 2,404
  • 15
  • 29
  • Attempting this method led to the error response: "The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() ". On your suggestion I've included a link to the relevant data sets – J.zendejas Nov 14 '18 at 20:45
  • Okay, this was meant to be a schematic of how to go about it. I'll try to find some time to work-up a specific example using your data. – DrM Nov 14 '18 at 23:10
  • @J.zendejas. That's it, lets see if that does it for you. You will need a reasonable guess for the parameters. Its a lengthy fit because of the integral, in essence a loop over t at each t. – DrM Nov 15 '18 at 03:18
  • I haven't been able to get it to work completely yet, but it has helped make progress, I'll keep working with it, thank you for your help! – J.zendejas Nov 28 '18 at 09:54
  • It might be worthwhile to try simulated annealing as a preliminary round, and then refine the result with least squares. But, be aware that some annealers do a refinement on each candidate configuration of the parameters (I recall that the annealer in the optimize module does it this way). That's often unnecessary, and might make it too expensive (i.e., slow) for your problem. You most likely want to save the least squares until after you select the configuration that corresponds to the best local minimum. I will try to find some time to look into this a bit and perhaps edit the answer. – DrM Nov 29 '18 at 13:58
  • I see that you are a physics student at UC Berkeley. It would be a good exercise for you to write a simulated annealer. Here's how. You randomly sample your parameter space in a region around your current configuration, by adding random increments (within some range), to each and every parameter. For each such configuration you calculate the sum square of the residuals. If it is your best so far, you keep it. Then you scale down the range of your random increments (i.e. "lower the temperature"), and repeat. – DrM Nov 29 '18 at 14:08
  • While working on this, I was able to find a way to complete the integration and get curve fitter to work with it, but quite poorly without the right initial guess. As per your suggestion I've written up a simulated annealer based on a simple [outline](http://katrinaeg.com/simulated-annealing.html) and applied it here. Though testing the code, it seems to be very inefficient as at this point it runs for some indeterminately long time. The code is [here](https://drive.google.com/open?id=1E0jkzNc2GihXcq8izvW3VkEMlKTRAGEH) – J.zendejas Dec 09 '18 at 21:45
  • That may be because it's a inherently numerically expensive problem, or you might have a bug, or you might be doing it inefficiently. Use numpy vector operations where you can. Use copious print statements to check that your configurations, residuals and sum of the square residuals make sense and respond to each temperature "bump", and that you are adequately sampling the parameter space before reducing the temperature. – DrM Dec 09 '18 at 22:21
  • After 90 years of advances in the technology of digital computing, the humble print statement is still very often one of the most important tools in debugging something. – DrM Dec 09 '18 at 22:29