4

4I am currently trying to fit a lot of data to a sine function. In the case where I only have one set of data (1D array), scipy.optimize.curve_fit() works fine. However it does not permit a higher dimensional data input if the function itself is only one dimensional as far as i can see. I don't want to iterate over the array using for loops as that works incredibly slow in python.

My code so far should look similar to this:

from scipy import optimize
import numpy as np    
def f(x,p1,p2,p3,p4): return p1 + p2*np.sin(2*np.pi*p3*x + p4)      #fit function

def fit(data,guess):
   n = data.shape[0] 
   leng = np.arange(n)
   param, pcov = optimize.curve_fit(f,leng,data,guess)
   return param, pcov

where data is a threedimensional array (shape=(x,y,z)) and I would like to fit each line data[:,a,b] to the function with param being a (4,y,z) shaped array as output. Of course, for multidimensional data this results in a

ValueError: operands could not be broadcast together with shapes (2100,2100) (5)

Maybe there is an easy solution to this but I am not sure how to do it. Any suggestions?

Searching for an answer to my question proofed quite difficult since most topics with those keywords relate to the fitting of higher dimensional functions.

Tetsujin no Oni
  • 7,300
  • 2
  • 29
  • 46
Linus-
  • 119
  • 2
  • 10
  • don't worry about for loops being small. I'm pretty sure curve_fitting will be the slow part of your code anyway. Profile the code if you suspect that the loop is a bottleneck! – David Zwicker Feb 26 '13 at 17:03
  • Well, the idea was that the curve_fitting would be much faster if it can use the whole array instead of running the function y*z times. That’s what i meant when saying that for loops are slow. – Linus- Feb 26 '13 at 18:04
  • Maybe you can get the parameters of sin by FFT. – HYRY Feb 27 '13 at 00:28
  • Why should curve fitting being _much_ faster? The majority of the time will likely be spent in the fitting routine and not in the looping over the data. – David Zwicker Feb 27 '13 at 08:25
  • I expected it to be faster because that is what I experienced with other routines when comparing the for loop approach to go over an array of data with directly feeding the whole array into the function (up to a factor 100). I admit that those were rather simple calculations and maybe the time saved when using a more time consuming method like the curve_fit may be less crucial, but I was still hoping for a little accelaration of the process. I tried using FFT as well but due to the little amount of sampling points I have, the result was not quite satisfactory. – Linus- Feb 27 '13 at 13:33
  • Nevertheless, I will give FFT another try in case I find no other solution to the problem. – Linus- Feb 27 '13 at 13:35

1 Answers1

4

Using np.apply_along_axis() solves your problem. Just do this:

func1d = lambda y, *args: optimize.curve_fit(f, xdata=x, ydata=y, *args)[0] #<-- [0] to get only popt
param = np.apply_along_axis( func1d, axis=2, arr=data )

See the example below:

from scipy import optimize
import numpy as np
def f(x,p1,p2,p3,p4):
    return p1 + p2*np.sin(2*np.pi*p3*x + p4)
sx = 50  # size x
sy = 200 # size y
sz = 100 # size z
# creating the reference parameters
tmp = np.empty((4,sy,sz))
tmp[0,:,:] = (1.2-0.8) * np.random.random_sample((sy,sz)) + 0.8
tmp[1,:,:] = (1.2-0.8) * np.random.random_sample((sy,sz)) + 0.8
tmp[2,:,:] = np.ones((sy,sz))
tmp[3,:,:] = np.ones((sy,sz))*np.pi/4
param_ref = np.empty((4,sy,sz,sx))     # param_ref in this shape will allow an
for i in range(sx):                    # one-shot evaluation of f() to create 
    param_ref[:,:,:,i] = tmp           # the data sample
# creating the data sample
x = np.linspace(0,2*np.pi)
factor = (1.1-0.9)*np.random.random_sample((sy,sz,sx))+0.9
data = f(x, *param_ref) * factor       # the one-shot evalution is here
# finding the adjusted parameters
func1d = lambda y, *args: optimize.curve_fit(f, xdata=x, ydata=y, *args)[0] #<-- [0] to get only popt
param = np.apply_along_axis( func1d, axis=2, arr=data )
Saullo G. P. Castro
  • 56,802
  • 26
  • 179
  • 234
  • 3
    Thank you for your answer, this does work. Unfortunately it does not speed up the process compared to the method with for loops. I guess David Zwicker was right about the fit function requiring the main amount of computing time. – Linus- Jun 21 '13 at 12:38
  • 1
    yes, you are right, apparently `apply_along_axis`, `vectorize` and `apply_over_axes` do not provide any performance gain compared to a Python `for` loop, but they do simplify the code... – Saullo G. P. Castro Jun 21 '13 at 13:02