0

Say we have an NxM array. From this, I want to unpack the N rows as 1xM arrays and pass them to a Python function as separate arguments.

An example scenario would be the SciPy package optimize.curve_fit, which has the rather productivity-killing requirement that N independent variables must be passed as a numpy.raveled and numpy.vstacked NxM-array (which they call xdata) and that the fit function also needs to take a single NxM-array as argument. The obvious problem with this is that virtually all N-dimensional model functions are canonically defined as f(x1, ..., xN, params) and not f(xdata, params). As a matter of fact, optimize.curve_fit even requires the fit function to be first numpy.raveled, i.e. it wants numpy.ravel(f(xdata, params)) instead of just f(x1, ..., xN, params). Whoever came up with this...

Anyway, as a consequence, if one doesn't want to waste time juggling data structures for each curve fit, one needs a small function that can be plugged in front of optimize.curve_fit that handles this unpacking and repacking (which really is a job optimize.curve_fit should be doing in the first place), so that the model functions can be defined in a canonical, mathematically clean way. Same goes for the fit data, which obviously is natively generated as columns of data points e.g. in a CSV file (no measurement device in the world generates NxM-arrays).

I've already done the part that repacks N data columns of length M into these NxM-arrays (as mentioned, one can use numpy.ravel and numpy.vstack). The problem is, how do I interface my fit functions f(x1, ..., xN, params)?

If I do this

def unpack(f, *args):
    return f(args)

then Python will not stop at splitting the NxM-array into N arrays of length M, but will "explode" the array all the way down into an array of N arrays of M arrays of length 1. Is there a way to limit the level of unpacking of nested arguments?

Alternatively, if starred expressions won't work, how about this:

def unpack(f, xdata):
    import numpy as np
    N_rows = xdata.shape[0]
    args = [???]    # <=== How to do this, i.e. from N rows generate N arguments?
    return f(args)

UPDATE: Following Jake Levi's answer, I'm trying this:

def repack_for_curve_fit(f, xdata):
    '''
    This function unpacks the NxM-array with independent data for N predictors into N separate 1xM-arrays,
    passes them to the fit function as the first N arguments, and returns the flattened fit function for
    further processing with the scipy.optimize.curve_fit package.
    
    Parameters
    ----------
    f : callable
        N-dimensional fit function, f(x1, ..., xN, *params), must take N independent variables as first N
        arguments and the fit parameters as remaining arguments
    xdata : array_like
        NxM-array with independent data for N predictors
    '''
    import numpy as np
    return np.ravel( f(*(x.reshape(1, -1) for x in xdata)) )

However, I'm getting an error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-39-51aaf66438c3> in <module>()
     26     # Do nonlinear curve fit using scipy.optimize.curve_fit, write fit results to TXT file
     27     params_init = (-7, -7, 0, 7)    # Specify initial guesses for fit parameters (default is 0)
---> 28     params_opt, params_cov = curve_fit(repack_for_curve_fit(fit_function, xyGrid), xyGrid, imgArrayFlattened, params_init)    # Run the fit, return optimized parameters and covariance matrix
     29     params_err = np.sqrt(np.diag(params_cov))    # Standard deviation of optimized parameters
     30     print(params_opt)

/home/davidra/Desktop/repack_for_curve_fit.py in repack_for_curve_fit(f, xdata)
     14     '''
     15     import numpy as np
---> 16     return np.ravel( f(*(x.reshape(1, -1) for x in xdata)) )

TypeError: fit_function() missing 4 required positional arguments: 'x0', 'y0', 'Imax', and 'FWHM'

This is the fit function:

def fit_function(x, y, x0, y0, Imax, FWHM):
    '''
    Parameters
    ----------
    x, y : float
        X and Y coordinates
    x0 : float
        X offset
    y0 : float
        Y offset
    Imax : float
        Peak intensity
    FWHM : float
        Full width at half maximum
    '''
    import numpy as np
    
    return Imax * np.e * 4 * np.log(2) * ((x+x0)**2 + (y+y0)**2) / FWHM**2 * np.exp(-4 * np.log(2) * ((x+x0)**2 + (y+y0)**2) / FWHM**2)

The fit_function is proven to work. Where is the problem?

Furthermore, do I really need to use reshape? Won't the list/tuple comprehension already by itself slice out the rows from the xdata array?

david
  • 201
  • 2
  • 10
  • If `x` is 1d array, then `f(*x.reshape(n,m))` will pass 'n` arguments to `f`. In effect, the `*` makes a list from the array, iterating on the `n` rows. That kind of iteration is sub-optimal. `reshape` and `ravel` are fast `numpy` operations, row iteration and such is slower. – hpaulj Sep 07 '20 at 18:28
  • `reshape` to (n,1,m) if you want iteration the first dimension to yield (1,m) arrays. – hpaulj Sep 08 '20 at 00:18
  • Thx for the suggestion. Why do we need `reshape` here, won't list comprehension automatically slice out the rows from the array? This is also what I don't quite understand in [Jake Levi's answer](https://stackoverflow.com/a/63780943/14236070) yet... – david Sep 08 '20 at 04:36
  • Depends on what your function requires, a (M,) or (1,M). The first line says 1xM. You may not appreciate `numpy's` flexibility with regard to dimensions. – hpaulj Sep 08 '20 at 05:57
  • @hpaulj: Hi, all fit functions are in their familiar form f(x1, ..., xN) while `curve_fit` requires an NxM `meshgrid`, plus, the function itself needs to be flattened. So, all in all: `curve_fit` insists on getting the independent data as an NxM `meshgrid` called `xdata`, the dependent data flattened as a 1xM array, but the **functions** are defined in their mathematically usual way, i.e. as f(x1, ..., xN, params), but must be passed as `numpy.ravel(f(xdata, params))`. SciPy basically stops short of handling native data and functions, passing the workload on to the user (without any benefit) – david Sep 08 '20 at 14:56

1 Answers1

0

Is this the kind of thing you're looking for, IE unpacking the N rows from an NxM array as 1xM arrays and passing them to a Python function as separate arguments, as shown in the bottom line in the following snippet? It's using an unpacked generator expression (but you can also replace it with an unpacked list comprehension by replacing the parentheses after the * with square brackets if you prefer).

import numpy as np

def useful_function(*args):
    """
    This useful function takes N arguments, each of which should be a 1*M array,
    and prints each argument to the console, along with its shape. N and M are
    both variable.
    """
    for i, a in enumerate(args):
        print("Argument {} is {}, with shape {}".format(i, a, a.shape))


n, m = 5, 4
n_x_m_array = np.arange(n*m).reshape(n, m)

print("Input NxM array:")
print(n_x_m_array)

useful_function(*(row.reshape(1, -1) for row in n_x_m_array))

Output:

Input NxM array:
[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]
 [16 17 18 19]]
Argument 0 is [[0 1 2 3]], with shape (1, 4)
Argument 1 is [[4 5 6 7]], with shape (1, 4)
Argument 2 is [[ 8  9 10 11]], with shape (1, 4)
Argument 3 is [[12 13 14 15]], with shape (1, 4)
Argument 4 is [[16 17 18 19]], with shape (1, 4)

RE your dissatisfaction about the SciPy interface, maybe opening an issue here would be a more appropriate place to discuss this? It's worth noting also that a lot of these scenarios with reformatting input arguments to an API function can often be solved quite easily with the right combination of sequence unpacking, list comprehensions, generator expressions, and/or lambda expressions, if you're familiar with these and know the right tricks.

Jake Levi
  • 1,329
  • 11
  • 16
  • Okay, for some reason, this does **not** work. Prepending `*` to a list/tuple comprehension throws `SyntaxError: can't use starred expression here` – david Sep 07 '20 at 20:12
  • ...and adding `.reshape(1, -1)` doesn't fix it. This is my toy example: `lst = [[1, 2, 3], [4, 5, 6], [7, 8, 9]] arr = np.array(lst) args = *[arg.reshape(1, -1) for arg in arr] print(args)` – david Sep 07 '20 at 20:21
  • I've appended your suggestion to the question, but can't accept your answer as it doesn't work, sry – david Sep 08 '20 at 16:16
  • I think it doesn't work in the code in your comment on my answer because you are trying to unpack the list comprehension while assigning the unpacked sequence to the variable args, and then print the value of args. But in my answer I think it works because I am using the unpacked generator expression directly as the arguments to the function. Have you tried this instead? – Jake Levi Sep 08 '20 at 22:50
  • Yes...at least I think that I have, see addendum to question (especially line 28 in the error message). I've defined a function according to your suggestion and used it as an argument to `curve_fit` – david Sep 08 '20 at 22:57
  • In fact, it might be easier to answer your question if it contains one minimal reproducible example as a code block of using the curve_fit function correctly, and one minimal reproducible example as another code block of how you want to call curve_fit with the NxM arrays that you want to unpack into row-vectors, along with the error that you're observing when you call it this way – Jake Levi Sep 08 '20 at 23:42
  • Minimal in the sense of not containing any code which is unnecessary in demonstrating what you're trying to demonstrate, reproducible in the sense that I could copy and paste it into a script and run it correctly on my PC. Like the code block in my answer above – Jake Levi Sep 08 '20 at 23:43
  • On a separate note, I think in general it is better practise to just import numpy once at the top of each Python module that uses numpy, rather than importing numpy as part of a function definition – Jake Levi Sep 08 '20 at 23:44