0

I'm currently writing a script that is supposed to remove redundant data points from my graph. My data includes overlaps from adjacent data sets and I only want the data that is generally higher. (Imagine two Gaussians with an x offset that overlap slightly. I'm only interested in the higher values in the overlap region, so that my final graph doesn't get all noisy when I combine the data in order to make a single spectrum.)

Here are my problems:

1) The x values aren't the same between the two data sets, so I can't just say "at x, take max y value". They're close together, but not equal.

2) The distances between x values aren't equal.

3) The data is noisy, so there can be multiple points where the data sets intersect. And while Gaussian A is generally higher after the intersection than Gaussian B, the noise means Gaussian B might still have SOME values which are higher. Meaning I can't just say "always take the highest values in this x area", because then I'd wildly combine the noise of both data sets.

4) I have n overlaps of this type, so I need an efficient algorithm and all I can come up with is somewhere at O(n^3), which would be something like "for each overlap, store data sets into two arrays and for each combination of data points (x0,y0) and (x1,y1) cycle through until you find the lowest combination of abs(x1-x0) AND abs(y1-y0)"

As I'm not a programmer, I'm completely lost. I also wasn't able to find an algorithm for this problem anywhere - most algorithms assume that the entries in the arrays I'm comparing are equal integers, but I'm working with almost-equal floats.

I'm using IDL, but I'd also be grateful for a general algorithm or at least a tip what I could try. Thanks!

veda905
  • 782
  • 2
  • 12
  • 32
PoorYorick
  • 173
  • 7
  • Isn't "suitable methods for solving a problem" kind of the definition of algorithms?? How is this not programming? – PoorYorick Mar 31 '16 at 10:14
  • Funfact: A graph is a network and only that. Using the word graph about a plot is bad practice passed down from bad high-school teachers. Also this belongs on http://stats.stackexchange.com/ – Ulf Aslak Apr 02 '16 at 20:22
  • Fun fact: In physics, a plot is a graph. But alright, I will ask in stats.stackexchange.com, even though my main problem is with the algorithm, not with the science behind it. – PoorYorick Apr 04 '16 at 12:12

1 Answers1

1

One way you can do this is if you fit gaussians to your data and then take the max assuming each data point is equal to the gaussian at that point.

This can be done as follows:

  • Fit some gaussian G1 to dataset X1 and some gaussian G2 to dataset X2, where the mean of G1 is less than the mean of G2.
  • Then, find their intersection point with some arithmetic.
  • Then, for all values of x less then the intersection take X1 and all values of x greater than the intersection take X2.