Say we have an NxM array. From this, I want to unpack the N rows as 1xM arrays and pass them to a Python function as separate arguments.
An example scenario would be the SciPy package optimize.curve_fit
, which has the rather productivity-killing requirement that N independent variables must be passed as a numpy.ravel
ed and numpy.vstack
ed NxM-array (which they call xdata
) and that the fit function also needs to take a single NxM-array as argument. The obvious problem with this is that virtually all N-dimensional model functions are canonically defined as f(x1, ..., xN, params)
and not f(xdata, params)
. As a matter of fact, optimize.curve_fit
even requires the fit function to be first numpy.ravel
ed, i.e. it wants numpy.ravel(f(xdata, params))
instead of just f(x1, ..., xN, params)
. Whoever came up with this...
Anyway, as a consequence, if one doesn't want to waste time juggling data structures for each curve fit, one needs a small function that can be plugged in front of optimize.curve_fit
that handles this unpacking and repacking (which really is a job optimize.curve_fit
should be doing in the first place), so that the model functions can be defined in a canonical, mathematically clean way. Same goes for the fit data, which obviously is natively generated as columns of data points e.g. in a CSV file (no measurement device in the world generates NxM-arrays).
I've already done the part that repacks N data columns of length M into these NxM-arrays (as mentioned, one can use numpy.ravel
and numpy.vstack
). The problem is, how do I interface my fit functions f(x1, ..., xN, params)
?
If I do this
def unpack(f, *args):
return f(args)
then Python will not stop at splitting the NxM-array into N arrays of length M, but will "explode" the array all the way down into an array of N arrays of M arrays of length 1. Is there a way to limit the level of unpacking of nested arguments?
Alternatively, if starred expressions won't work, how about this:
def unpack(f, xdata):
import numpy as np
N_rows = xdata.shape[0]
args = [???] # <=== How to do this, i.e. from N rows generate N arguments?
return f(args)
UPDATE: Following Jake Levi's answer, I'm trying this:
def repack_for_curve_fit(f, xdata):
'''
This function unpacks the NxM-array with independent data for N predictors into N separate 1xM-arrays,
passes them to the fit function as the first N arguments, and returns the flattened fit function for
further processing with the scipy.optimize.curve_fit package.
Parameters
----------
f : callable
N-dimensional fit function, f(x1, ..., xN, *params), must take N independent variables as first N
arguments and the fit parameters as remaining arguments
xdata : array_like
NxM-array with independent data for N predictors
'''
import numpy as np
return np.ravel( f(*(x.reshape(1, -1) for x in xdata)) )
However, I'm getting an error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-39-51aaf66438c3> in <module>()
26 # Do nonlinear curve fit using scipy.optimize.curve_fit, write fit results to TXT file
27 params_init = (-7, -7, 0, 7) # Specify initial guesses for fit parameters (default is 0)
---> 28 params_opt, params_cov = curve_fit(repack_for_curve_fit(fit_function, xyGrid), xyGrid, imgArrayFlattened, params_init) # Run the fit, return optimized parameters and covariance matrix
29 params_err = np.sqrt(np.diag(params_cov)) # Standard deviation of optimized parameters
30 print(params_opt)
/home/davidra/Desktop/repack_for_curve_fit.py in repack_for_curve_fit(f, xdata)
14 '''
15 import numpy as np
---> 16 return np.ravel( f(*(x.reshape(1, -1) for x in xdata)) )
TypeError: fit_function() missing 4 required positional arguments: 'x0', 'y0', 'Imax', and 'FWHM'
This is the fit function:
def fit_function(x, y, x0, y0, Imax, FWHM):
'''
Parameters
----------
x, y : float
X and Y coordinates
x0 : float
X offset
y0 : float
Y offset
Imax : float
Peak intensity
FWHM : float
Full width at half maximum
'''
import numpy as np
return Imax * np.e * 4 * np.log(2) * ((x+x0)**2 + (y+y0)**2) / FWHM**2 * np.exp(-4 * np.log(2) * ((x+x0)**2 + (y+y0)**2) / FWHM**2)
The fit_function
is proven to work. Where is the problem?
Furthermore, do I really need to use reshape
? Won't the list/tuple comprehension already by itself slice out the rows from the xdata
array?