10

I'm sorry for the somewhat confusing title, but I wasn't sure how to sum this up any clearer.

I have two sets of X,Y data, each set corresponding to a general overall value. They are fairly densely sampled from the raw data. What I'm looking for is a way to find an interpolated X for any given Y for a value in between the sets I already have.

The graph makes this more clear:

A graph of points

In this case, the red line is from a set corresponding to 100, the yellow line is from a set corresponding to 50.

I want to be able to say, assuming these sets correspond to a gradient of values (even though they are clearly made up of discrete X,Y measurements), how do I find, say, where the X would be if the Y was 500 for a set that corresponded to a value of 75?

In the example here I would expect my desired point to be somewhere around here:

A graph of points with an interpolated point

I do not need this function to be overly fancy — it can be simple linear interpolation of data points. I'm just having trouble thinking it through.

Note that neither the Xs nor the Ys of the two sets overlap perfectly. However it is rather trivial to say, "where are the nearest X points these sets share," or "where are the nearest Y points these sets share."

I have used simple interpolation between known values (e.g. find the X for corresponding Ys for set "50" and "100", then average those to get "75") and I end up with something like that looks like this:

Not very good interpolation

So clearly I am doing something wrong here. Obviously in this case X is (correctly) returning as 0 for all of those cases where the Y is higher than the maximum Y of the "lowest" set. Things start out great but somewhere around when one starts to approach the maximum Y for the lowest set it starts going haywire.

It's easy to see why mine is going wrong. Here's another way to look at the problem:

Illustration

In the "correct" version, X ought to be about 250. Instead, what I'm doing is essentially averaging 400 and 0 so X is 200. How do I solve for X in such a situation? I was thinking that bilinear interpolation might hold the answer but nothing I've been able to find on that has made it clear how I'd go about this sort of thing, because they all seem to be structured for somewhat different problems.

Thank you for your help. Note that while I have obviously graphed the above data in R to make it easy to see what I'm talking about, the final work for this is in Javascript and PHP. I'm not looking for something heavy duty; simple is better.

nucleon
  • 861
  • 6
  • 22
  • This sounds like more of a maths problem than a programming problem, so is probably better-suited for http://math.stackexchange.com or http://stats.stackexchange.com. – Oliver Charlesworth Dec 08 '13 at 17:30
  • 1
    I'm looking for a practical solution, as opposed to a theoretical one. My experience with those kinds of forums is they like to reply with an elegant equation that I have no idea how to implement as code. (Also, I understand literally none of the questions currently on the math or stats front pages, which isn't encouraging...) (An example of what I mean: http://math.stackexchange.com/questions/177491/how-to-perform-simple-linear-interpolation-on-a-data-set) – nucleon Dec 08 '13 at 17:40
  • Is the problem that you might not have a point on the red and/or yellow line? If so you could interpolate the value for the red and yellow line separately and then take the average..? – thebjorn Dec 08 '13 at 17:42
  • That's what the current function does. – nucleon Dec 08 '13 at 18:25
  • [This answer](http://mathoverflow.net/a/43091/43742) looks promising. Basically the idea is to do morphing between the two lines (as seen e.g. in morphing between two faces). Maybe this will point you in the right direction? Other, simpler ideas: One very simple idea would be linear interpolation, but on both axis. Just average the two interpolations together, maybe even weight them. Another idea is to find the closest point on the other line, for each point and each set. You would need to do this twice (once for every line), as the results are not symmetrical. Then somehow average them. – jmiserez Dec 08 '13 at 20:00
  • The morphing seems like too heavyweight of a solution for this. I've thought about dual-axis linear interpolation though how to implement that isn't entirely obvious to me. – nucleon Dec 08 '13 at 22:15
  • So I've concluded that this is actually an interestingly hard problem. The correct mid-point for any given point is actually something more complicated: imagine a line AB going from 0,0 to a point x1,y1 on the outer set. x2,y2 is the point on the inner set where AB intersects it. AB is thus defined by the fact that the proper interpolated X spot with intersect with the desired Y spot. But how to code it... – nucleon Dec 09 '13 at 04:05

1 Answers1

11

Good lord, I finally figured it out. Here's the end result:

The final product

Beautiful! But what a lot of work it was.

My code is too cobbled and too specific to my project to be of much use to anyone else. But here's the underlying logic.

You have to have two sets of data to interpolate from. I am calling these the "outer" curve and the "inner" curve. The "outer" curve is assumed to completely encompass, and not intersect with, the "inner" curve. The curves are really just sets of X,Y data, and correspond to a set of values defined as Z. In the example used here, the "outer" curve corresponds to Z = 50 and the "inner" curve corresponds to Z = 100.

The goal, just to reiterate, is to find X for any given Y where Z is some number in between our known points of data.

  1. Start by figuring out the percentage between the two curve sets that the unknown Z represents. So if Z=75 in our example then that works out to be 0.5. If Z = 60 that would be 0.2. If Z = 90 then that would be 0.8. Call this proportion P.

  2. Select the data point on the "outer" curve where Y = your desired Y. Imagine a line segment between that point and 0,0. Define that as AB.

  3. We want to find where AB intersects with the "inner" curve. To do this, we iterate through each point on the inner curve. Define the line segment between the chosen point and the point+1 as CD. Check if AB and CD intersect. If not, continue iterating until they do.

  4. When we find an AB-CD intersection, we now look at the line created by the intersection and our original point on the "outer" curve from step 2. This line segment, then, is a line between the inner and outer curve where the slope of the line, were it to be continued "down" the chart, would intersect with 0,0. Define this new line segment as EF.

  5. Find the position at P percent (from step 1) of the length of EF. Check the Y value. Is it our desired Y value? If it is (unlikely), return the X of that point. If not, see if Y is less than the goal Y. If it is, store the position of that point in a variable, which I'll dub lowY. Then go back to step 2 again for the next point on the outer curve. If it is greater than the goal Y, see if lowY has a value in it. If it does, interpolate between the two values and return the interpolated X. (We have "boxed in" our desired coordinate, in other words.)

The above procedure works pretty well. It fails in the case of Y=0 but it is easy to do that one since you can just do interpolation on those two specific points. In places where the number of sample is much less, it produces kind of jaggy results, but I guess that's to be expected (these are Z = 5000,6000,7000,8000,9000,10000, where only 5000 and 10000 are known points and they have only 20 datapoints each — the rest are interpolated):

Jaggy results

I am under no pretensions that this is an optimized solution, but solving for gobs of points is practically instantaneous on my computer so I assume it is not too taxing for a modern machine, at least with the number of total points I have (30-50 per curve).

Thanks for everyone's help; it helped a lot to talk this through a bit and realize that what I was really going for here was not any simple linear interpolation but a kind of "radial" interpolation along the curve.

nucleon
  • 861
  • 6
  • 22
  • 3
    Really nice work man. I had the same question, but using matlab. Here is the answer if you want it =) http://stackoverflow.com/questions/23494254/interpolation-between-two-curves-matlab – Nikko May 07 '14 at 09:14
  • Your answer is something I need to do in the next few days. I'm going to give it a go and see if I can sort it out. Just wanted to say thanks! – la femme cosmique Sep 11 '18 at 17:21
  • And if you still happened to have some code lying about, I'd be most grateful if I could see it. – la femme cosmique Sep 11 '18 at 17:22
  • It's unfortunately VERY specific to the project and VERY convoluted. (At this point I only about half-understand it, it has been so long that I've looked at it.) One of these days I will try to create a generalized version of the same function... not today unfortunately! – nucleon Sep 28 '18 at 18:58