11

I'm trying to fit some data from a simulation code I've been running in order to figure out a power law dependence. When I plot a linear fit, the data does not fit very well.

Here's the python script I'm using to fit the data:

#!/usr/bin/env python
from scipy import optimize
import numpy

xdata=[ 0.00010851,  0.00021701,  0.00043403,  0.00086806,  0.00173611, 0.00347222]
ydata=[ 29.56241016,  29.82245508,  25.33930469,  19.97075977,  12.61276074, 7.12695312]

fitfunc = lambda p, x: p[0] + p[1] * x ** (p[2])
errfunc = lambda p, x, y: (y - fitfunc(p, x))

out,success = optimize.leastsq(errfunc, [1,-1,-0.5],args=(xdata, ydata),maxfev=3000)

print "%g + %g*x^%g"%(out[0],out[1],out[2])

the output I get is: -71205.3 + 71174.5*x^-9.79038e-05

While on the plot the fit looks about as good as you'd expect from a leastsquares fit, the form of the output bothers me. I was hoping the constant would be close to where you'd expect the zero to be (around 30). And I was expecting to find a power dependence of a larger fraction than 10^-5.

I've tried rescaling my data and playing with the parameters to optimize.leastsq with no luck. Is what I'm trying to accomplish possible or does my data just not allow it? The calculation is expensive, so getting more data points is non-trivial.

Thanks!

user
  • 5,335
  • 7
  • 47
  • 63
zje
  • 3,824
  • 4
  • 25
  • 31
  • In the docs, it looks like this function expects the `params` argument to be second, and the `xdata` argument to be first. I doubt this will change things, but can you give that a try and see what happens? – ely Apr 16 '12 at 20:35
  • N/M I just made this change and get the same results you do. It doesn't help you, but does show that these docs need to be much better. – ely Apr 16 '12 at 20:49
  • The only other thing I can think is: can you get the standard errors of these estimates back? In O.L.S. regression, there's a nice formula for the standard errors of the coefficients. With such a small data set, I can believe they are extremely large. You may just be seeing small sample size effects. Have you tried this with a larger data set, say ~100 observations? – ely Apr 16 '12 at 20:51
  • 1
    You may also want to ask at stats.stackexchange.com – ely Apr 16 '12 at 20:55
  • Thanks for all the tips! Should this be migrated or just asked as a new question? I feel like this may be more appropriate for the stats.stackexchange audience. – zje Apr 16 '12 at 21:13
  • I would say to re-ask over there. If you get any sensible answers, then come back and delete it here. If not, then just edit each post to include a link to the cross-posting. A moderator may come along and change it to one or the other. – ely Apr 16 '12 at 21:23

3 Answers3

8

It is much better to first take the logarithm, then use leastsquare to fit to this linear equation, which will give you a much better fit. There is a great example in the scipy cookbook, which I've adapted below to fit your code.

The best fits like this are: amplitude = 0.8955, and index = -0.40943265484

As we can see from the graph (and your data), if its a power law fit we would not expect the amplitude value to be near 30. As in the power law equation f(x) == Amp * x ** index, so with a negative index: f(1) == Amp and f(0) == infinity.

enter image description here

from pylab import *
from scipy import *
from scipy import optimize

xdata=[ 0.00010851,  0.00021701,  0.00043403,  0.00086806,  0.00173611, 0.00347222]
ydata=[ 29.56241016,  29.82245508,  25.33930469,  19.97075977,  12.61276074, 7.12695312]

logx = log10(xdata)
logy = log10(ydata)

# define our (line) fitting function
fitfunc = lambda p, x: p[0] + p[1] * x
errfunc = lambda p, x, y: (y - fitfunc(p, x))

pinit = [1.0, -1.0]
out = optimize.leastsq(errfunc, pinit,
                       args=(logx, logy), full_output=1)

pfinal = out[0]
covar = out[1]

index = pfinal[1]
amp = 10.0**pfinal[0]

print 'amp:',amp, 'index', index

powerlaw = lambda x, amp, index: amp * (x**index)
##########
# Plotting data
##########
clf()
subplot(2, 1, 1)
plot(xdata, powerlaw(xdata, amp, index))     # Fit
plot(xdata, ydata)#, yerr=yerr, fmt='k.')  # Data
text(0.0020, 30, 'Ampli = %5.2f' % amp)
text(0.0020, 25, 'Index = %5.2f' % index)
xlabel('X')
ylabel('Y')

subplot(2, 1, 2)
loglog(xdata, powerlaw(xdata, amp, index))
plot(xdata, ydata)#, yerr=yerr, fmt='k.')  # Data
xlabel('X (log scale)')
ylabel('Y (log scale)')

savefig('power_law_fit.png')
show()
fraxel
  • 34,470
  • 11
  • 98
  • 102
  • Thank you very much, this was helpful. For now, I'm going to use unutbu's solution. However, I understand that passing data as close to linear as possible is beneficial for least squares fitting. Thanks again! – zje Apr 16 '12 at 22:50
  • @user825518 - cool, and yes if you're looking for a power law with offset, unutbu's method is good approach! – fraxel Apr 17 '12 at 06:42
4

It helps to rescale xdata so the numbers are not all so small. You could work in a new variable xprime = 1000*x. Then fit xprime versus y.

Least squares will find parameters q fitting

y = q[0] + q[1] * (xprime ** q[2]) 
  = q[0] + q[1] * ((1000*x) ** q[2])

So let

p[0] = q[0]
p[1] = q[1] * (1000**q[2])
p[2] = q[2]

Then y = p[0] + p[1] * (x ** p[2])

It also helps to change the initial guess to something closer to your desired result, such as [max(ydata), -1, -0.5].

from scipy import optimize
import numpy as np

def fitfunc(p, x):
    return p[0] + p[1] * (x ** p[2])
def errfunc(p, x, y):
    return y - fitfunc(p, x)

xdata=np.array([ 0.00010851,  0.00021701,  0.00043403,  0.00086806,
                 0.00173611, 0.00347222])
ydata=np.array([ 29.56241016,  29.82245508,  25.33930469,  19.97075977,
                 12.61276074, 7.12695312])

N = 5000
xprime = xdata * N

qout,success = optimize.leastsq(errfunc, [max(ydata),-1,-0.5],
                               args=(xprime, ydata),maxfev=3000)

out = qout[:]
out[0] = qout[0]
out[1] = qout[1] * (N**qout[2])
out[2] = qout[2]
print "%g + %g*x^%g"%(out[0],out[1],out[2])

yields

40.1253 + -282.949*x^0.375555

unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
  • I forgot that I had also rescaled `x`. I've edited the post to explain. – unutbu Apr 16 '12 at 22:26
  • Note you can replace the rescaling factor 1000 with 500 or 5000 and the result does not change (significantly). – unutbu Apr 16 '12 at 22:28
  • Got it, thanks! I had previously tried rescaling, but I suppose it still misbehaves without the decent guess for the constant term. – zje Apr 16 '12 at 22:38
1

The standard way to use linear least squares to obtain an exponential fit is to do what fraxel suggests in his/her answer: fit a straight line to log(y_i).

However, this method has known numerical disadvantages, particularly sensitivity (a small change in the data yields a large change in the estimate). The preferred alternative is to use a nonlinear least squares approach -- it is less sensitive. But if you are satisfied with the linear LS method for non-critical purposes, just use that.

Community
  • 1
  • 1
Steve Tjoa
  • 59,122
  • 18
  • 90
  • 101
  • 1
    Yes, I'm just trying to get a rough feel for the parameters. None of these numbers are intended for publication. Thanks! – zje Apr 17 '12 at 03:02