0

I have a set of data points pair (y,x). I want to fit a function using the form

y = c * x * log2(x)

I want to find the value of c. Matlab lsqcurvefit is not working for this. It seems to be stuck in local optima.

Any suggestions on how to do it?

Thanks!

cdbitesky
  • 1,390
  • 1
  • 13
  • 30
  • 1
    If I were to do it by hand I would plug each of the pairs into their x,y places in the function and solve for c. c = y/(x*lg2(x)). – cdbitesky Dec 11 '13 at 16:49
  • Hmm.. I get a critical point at c = E(yi - xi * log2 xi) with least-squares. No idea if it's correct though, I've suppressed most of the math I even may have known. – doynax Dec 11 '13 at 16:55
  • Scratch that, I'm out of my depth and admit defeat. Possibly c = E(yi) / E(xi log2 xi) – doynax Dec 11 '13 at 17:05
  • 1
    How is `lsqcurvefit` not working? Can you post some code and a subset of data? – craigim Dec 11 '13 at 17:49

2 Answers2

3

As cdbitesky wrote, the simplest way to estimate c is to compute pointwise ratios and take the mean:

c_est = mean(y ./ (x .* log2(x)));

Another would be to use Matlab's matrix division, which performs a least squares fit:

c_est = y / (x .* log2(x));

The optimal way to estimate c can only be derived if you have an idea how (if at all) your data deviate from the ideal equation y = c * x * log2(x). Are your data corrupted by additive noise or multiplicative? Where does this noise come from? etc.

A. Donda
  • 8,381
  • 2
  • 20
  • 49
  • Thanks all for the help. I do understand the problem now. By the method mentioned above, I estimated the c's for the individual pairs and found that it varied widely. So the data is actually noisy(not a perfect fit for cnlog(n), which is why matlab was off in its estimates! – Aniket Chakrabarti Dec 11 '13 at 19:12
  • @AniketChakrabarti, you're welcome. – In that case, the second method might give the better estimate. As I wrote, in order to come up with something even better it would be necessary to know more about the noise. – A. Donda Dec 11 '13 at 19:14
1

Using some weights w[k], compute the sums

yxlx over w[k]*y[k]*x[k]*log2(x[k]) and

xlx2 over w[k]*sqr(x[k]*log2(x[k])), where sqr(u)=u*u.

Then the estimate for c is yxlx/xlx2.

One can chose the standard weights w[k]=1 or adapting weights

w[k]=1/( 1+sqr( x[k]*log2(x[k]) ) )

or even more adapting

w[k]=1/( 1+sqr( x[k]*log2(x[k]) ) +sqr( y[k] ) ) 

so that large values for x,y do not excessively influence the estimate. For some middle strategy take the square root of those expressions as weights.


Mathematics: These formulas result from the formulation of the estimation problem as a weighted least square problem

sum[ w(x,y)*(y-c*f(x))^2 ]        over (x,y) in Data 

which expands as

sum[ w(x,y)*y^2 ] 
     -2*c* sum[ w(x,y)*y*f(x) ] 
          + c^2 * sum[ w(x,y)*f(x)^2 ]      over (x,y) in Data 

where the minimum is located at

c = sum[ w(x,y)*y*f(x) ] / sum[ w(x,y)*f(x)^2 ]

w(x,y) should be approximately inverse to the variance of the error at (x,y), so if you expect a uniform size of the error, then w(x,y)=1, if the error grows proportional to x and y, then w(x,y)=1/(1+x^2+y^2) or similar is a sensible choice.

Lutz Lehmann
  • 25,219
  • 2
  • 22
  • 51
  • I'm not sure I understand the logic of your answer. Why would a weighted mean of y include `x[k]*log2(x[k])`, and a weighted mean of x a square? Have you tried these calculations and are sure they actually result in the correct value? Moreover, it would be better if you included valid Matlab code, since the question was asked in a Matlab context. – A. Donda Dec 13 '13 at 14:18
  • 1
    Yes, these are fairly standard constructions for weighted least squares. I'll add a section about that. – Lutz Lehmann Dec 13 '13 at 14:47