2

I'm attempting to fit a cubic spline to a time-series using scipy's interpolate.splrep. However, I can't work out how to determine a valid smoothing condition without manually adjusting it by eye. It seems like there should be a way to calculate this condition.

According to the docs, the smoothing condition should be determined in this way:

Recommended values of s depend on the weights, w. If the weights represent the inverse of the standard-deviation of y, then a good s value should be found in the range (m-sqrt(2*m),m+sqrt(2*m)) where m is the number of datapoints in x, y, and w. default : s=m-sqrt(2*m) if weights are supplied. s = 0.0 (interpolating) if no weights are supplied.

However, after much testing, I haven't been able to get this to work (where smoothing is non-zero). The "fit" usually ends up looking like an arbitrary 3rd degree polynomial. I'm dealing with a dataset that should be a high degree polynomial when fit properly. Just from fiddling around with the smoothing condition, I've found s = 1E-9 to balance closeness and smoothing well (I'm using weights with the data).

Does anyone have any ideas what's going on?

There are reasons I'm using a cubic-spline over other interpolation methods, but I'm wondering if I should be looking elsewhere...

WillaB
  • 420
  • 5
  • 12
  • The relevant sentence is: "The amount of smoothness is determined by satisfying the conditions: sum((w * (y - g))**2,axis=0) <= s where g(x) is the smoothed interpolation of (x,y)." – pv. Aug 29 '15 at 16:27
  • But how do I solve for s when we don't know what g(x) is? Maybe I'm missing something. – WillaB Aug 31 '15 at 17:03
  • The left-hand side is essentially the chi^2 that you want the fit to aim for. So you need to know or be able to guess how much noise your data has. Suppose you roughly know each data point has error dy. Then you have (for w=1) roughly s=m*dy**2 where m is the number of data points, and this is where the suggestion in the docstring comes from (taking dy=1). – pv. Sep 04 '15 at 16:07
  • This doesn't make sense to me. If you're assuming dy=1, then s=m, which doesn't work for my dataset (as I said in my question). Can you please explicitly post a solution where weights are used (and not equal to 1), and dy is also therefore not equal to 1? I appreciate your help thus far, but I seem to missing what your saying. An explicit response would be very helpful. – WillaB Sep 04 '15 at 19:57

0 Answers0