5

I'm attempting to fit a 2D point-cloud (x and y coordinates). So far, I've had limited success with scipy's interpolation packages, most notable UnivariateSpline, which produced the following (sub-optimal) fit (please ignore the colors): UnivariateSpline

However, this is obviously the wrong package, since I need a final curve that can bend back in on itself (so no longer a 1d function) near the edges of my parabolic point-cloud.

I then read up on interp2d, for example, but don't understand what my z array would be. Is there a better class of packages that I'm perhaps overlooking?


Update 1: as suggested in the comments, I've redone this using scipy.interpolate.splprep; my general setup is:

from scipy.interpolate import splprep, splev
pts = np.vstack((X.ravel(), Y.ravel)) #X and Y contain my points
(tck, u), fp, ier, msg = splprep(pts, u=None, per=0, k=3, full_output=True) #s = optional parameter (default used here)
print('Spline score:',fp) #goodness of fit flatlines after a given s value (and higher), which captures the default s-value as well 
x_new, y_new = splec(u_new, tck, der=0)
plt.plot(x_new, y_new, 'k')
plt.show()

The plot is below. Can anyone suggest a method for automating the decision of s... possible a loop while assessing coefficient of determination of each s plot? Or is there something baked in?

splprep


Update 2: I've since re-run this on two different point-clouds, and found that the ordering of the points significantly alters the outcome. When I re-order the points in the point-cloud to be along an initial fit to a parabola, I get much better results with my spline. As well, the results are still sub-optimal, as can be seen below.

Spline fit 1 (default s-value) Spline fit 2 (default s-value

Are there any further adjustments I can make with this method? Alternatively, does anyone have a suggestion for a competitive approach I can investigate?


Update 3: actually, setting knots = 5 helps out tremendously: knots=5

ees
  • 327
  • 1
  • 17
  • Note: Asking help to find libraries are generally not welcome here. What libraries have you looked at? [Matplotlib](https://jakevdp.github.io/PythonDataScienceHandbook/04.12-three-dimensional-plotting.html)? – Torxed May 09 '20 at 20:03
  • @Torxed I'm asking for help identifying what class of libraries I should even be considering for this problem. I've experimented with scipy make_lsq_spline, Bspline, CubicSpline, make_interp_spline, UnivariateSpline, interpolate... but think I'm looking at the wrong class of interpolation methods. The right class seems like it would be interp2d, but I don't understand how my problem fits in there (as mentioned above) – ees May 09 '20 at 20:15
  • @Torxed please let me know if you think I should still remove this post. Thank you – ees May 09 '20 at 21:16
  • Not sure, I don't think I fully understand the question (partly because It's written in a academic way regarding a academic topic.. I'm just a simple peasant). You can probably leave it, but the attention it'll get is probably low :) – Torxed May 09 '20 at 21:25
  • @Torxed I wasn't intending to be high-brow at all. Thank you for calling this to my attention... I'm just trying to get a parabolic-esque line that goes through more of the points at the edges of the scatter plot. – ees May 09 '20 at 21:42
  • Ah hehe, you didn't come off as it ^^ It's just that I really don't understand these topics that well unless I've used/been in the same situation :) I suggest to keep it in its original state as these topics **are** more academic :) – Torxed May 09 '20 at 21:53
  • 1
    I don't think `interp2d` is what you are looking for: it looks like it is used when you have a function in 3d defined in the form `z=f(x,y)`. `interp1d` looks like it is closer, but would probably suffer the same problem as `UnivariateSpline` since it is for a 1-to-1 function `y=f(x)`. You need the result to be a parametric function. – Oli May 09 '20 at 22:03
  • 1
    Have you looked at the `scipy.interpolate.splprep` function: https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.splprep.html#scipy.interpolate.splprep ? – Oli May 09 '20 at 22:05
  • @Oli I've updated my post with your suggestion... it's looking very good but still not sure how this could be automated/optimized. – ees May 09 '20 at 22:33
  • 1
    @ees Do you have any other datasets where this value of `s` does not give the results you want? If not, I would stick with a hardcoded smoothness value. If you do have such a dataset, please include it in the question and say what results you are looking for. – Oli May 10 '20 at 15:15
  • @Oli I've presented an Update 2 in my original post. I'm using the default `s` value now, but also seeing that the goodness measure (weighted sum of squared residuals) flatlines after a given threshold `s` is reached (the default lies in this realm). The fits appear sub-optimal, wouldn't you say? – ees May 11 '20 at 17:33
  • 1
    The fit looks great to me... What specifically would you like to be different? – Oli May 11 '20 at 20:01
  • @Oli the fit with k=5 looks outstanding, I agree. Forgot to respond back after I posted it. On the off chance, do you happen to know if all of these spline methods require a general ordering in the dataset? For example, when the X-Y coordinate pairs are randomly reshuffled in the global coordinates list (such that X_i stays with Y_i but the pair move to a new row randomly), the fits turn to garbage. The method works amazing, however, when I give a generally correct ordering of points along the trajectory as a prerequisite. I've also seen this with the `sgolay` method, very strange. – ees May 12 '20 at 15:17
  • 1
    @ees I'm not an expert in this area, but as far as my searching could tell me, there is no provided function that does what you are looking for. As far as I understand, finding the correct order of the points is equivalent to the travelling salesman problem. A quick search yielded this solution: https://stackoverflow.com/a/44080908/6744133 . It should be possible to adapt this to re-order your points so they work well when you `splprep` them (be warned, this problem is well-known as taking a long time to compute the answer to). – Oli May 12 '20 at 18:25
  • @Oli that's very interesting, I'll take a look. Thank you so much for all of your help!! – ees May 13 '20 at 16:57
  • @ees You're welcome! – Oli May 14 '20 at 14:09

1 Answers1

0

I faced a similar issue, and the solution to my problem was to use Principal Curves (https://hastie.su.domains/Papers/Principal_Curves.pdf).

This implementation available on GitHub could help you: https://github.com/zsteve/pcurvepy.

Virgilio K
  • 53
  • 7