3

I am trying to plot large amounts of points using some library. The points are ordered by time and their values can be considered unpredictable.

My problem at the moment is that the sheer number of points makes the library take too long to render. Many of the points are redundant (that is - they are "on" the same line as defined by a function y = ax + b). Is there a way to detect and remove redundant points in order to speed rendering ?

Thank you for your time.

nc3b
  • 15,562
  • 5
  • 51
  • 63

3 Answers3

8

The following is a variation on the Ramer-Douglas-Peucker algorithm for 1.5d graphs:

  1. Compute the line equation between first and last point
  2. Check all other points to find what is the most distant from the line
  3. If the worst point is below the tolerance you want then output a single segment
  4. Otherwise call recursively considering two sub-arrays, using the worst point as splitter

In python this could be

def simplify(pts, eps):
    if len(pts) < 3:
        return pts
    x0, y0 = pts[0]
    x1, y1 = pts[-1]
    m = float(y1 - y0) / float(x1 - x0)
    q = y0 - m*x0
    worst_err = -1
    worst_index = -1
    for i in xrange(1, len(pts) - 1):
        x, y = pts[i]
        err = abs(m*x + q - y)
        if err > worst_err:
            worst_err = err
            worst_index = i
    if worst_err < eps:
        return [(x0, y0), (x1, y1)]
    else:
        first = simplify(pts[:worst_index+1], eps)
        second = simplify(pts[worst_index:], eps)
        return first + second[1:]

print simplify([(0,0), (10,10), (20,20), (30,30), (50,0)], 0.1)

The output is [(0, 0), (30, 30), (50, 0)].

About python syntax for arrays that may be non obvious:

  • x[a:b] is the part of array from index a up to index b (excluded)
  • x[n:] is the array made using elements of x from index n to the end
  • x[:n] is the array made using first n elements of x
  • a+b when a and b are arrays means concatenation
  • x[-1] is the last element of an array

An example of the results of running this implementation on a graph with 100,000 points with increasing values of eps can be seen here.

Andrew Whitaker
  • 124,656
  • 32
  • 289
  • 307
6502
  • 112,025
  • 15
  • 165
  • 265
  • I am not sure I understand what the first and last points are. – nc3b Jan 16 '11 at 22:09
  • That's only valid if your first and last points can be guaranteed not to be outliers themselves. Usually a more thorough analysis would be employed, like "least squares". – Lightness Races in Orbit Jan 16 '11 at 22:10
  • 2
    @Tomalak: I think you're confusing the problem. Here we're not removing outliers, but "boring" points. There is no problem at all if first and last point are far from a fitting line. Sure this simplification may not be "optimal", but it's very fast and easy to code. – 6502 Jan 16 '11 at 22:25
  • @nc3b: I meant the first and last point in the array. I added a python implementation; it should be reasonably easy to read even if you don't know python but know other imperative languages. Note that it can be made much faster than this by avoiding copying data around and using start/stop indexes instead. – 6502 Jan 16 '11 at 22:38
  • @6502: I didn't suggest removing outliers. I'm pointing out that to remove "boring" points, you need to make sure that they are indeed boring -- that is, that they are not outliers. If either of the first or last point in the dataset is an outlier, your approach will completely break. – Lightness Races in Orbit Jan 17 '11 at 14:41
  • @Tomalak: Before downvoting did you actually spent any time trying to understand how the algorithm works? The first and last point for example are NEVER removed, no matter what are the values in the array. I added an example of the results of this algorithm on a graph with 100,000 points. – 6502 Jan 17 '11 at 22:38
  • @6502: I'm sorry that I don't appear to have been clear. I never said that the first and last point are removed. What I said is that generating a line of best fit -- which your entire algorithm is subsequently based on -- cannot possibly work simply by taking the first and last point. If either are an outlier, your entire baseline is wrong. You must consider *all* points to generate a line-of-best-fit. – Lightness Races in Orbit Jan 17 '11 at 22:57
  • @Tomalak: The line of best fit is something that you are insisting on but that's totally unrelated to this problem. The input is not a (straight) line, it's a graph. We're NOT looking for a line, but for a similar graph with less points. If even the shown results of this algorithm didn't make this clear to you then I'm sorry but I think that nothing will. Note that I actually LOVE least squares... in a booklet of formulas I wrote for my colleagues 20 out of 56 pages are about least squares (e.g. see page 47: http://goo.gl/QmszB). However for this problem IMO they're just the WRONG tool. – 6502 Jan 17 '11 at 23:45
  • @6502: You keep jumping ahead when parsing my suggestion. I know we're not looking for a line; I know we're looking for a similar graph with less points. Your step one to get there creates a line: my response is, and always has been, that your method to determine that line is fundamentally flawed. – Lightness Races in Orbit Jan 18 '11 at 00:34
  • 1
    @Tomalak: You apparently don't get the point that if the output has to be connected straight line segments then simply there are no degrees of freedom left to use LSQ for. I also found that the algorithm has already been invented in 1972 by Ramar-Douglas-Peucker in the original form I first thought about it (general n-dimensional polyline instead of a graph) so I added a link to the relevant page. BTW: I also noticed you didn't even took the time to click on the result link (goo.gl tells me so) so I'll just classify you as trolling and move over. – 6502 Jan 18 '11 at 07:58
  • @6502: The link was irrelevant and your personal attack means that this conversation is over. Your solution is still incorrect; it's a shame that you haven't bothered to consider this possibility, and that you haven't bothered to ask about the elements of my explanation that you have not understood. Good luck in the future. – Lightness Races in Orbit Jan 18 '11 at 10:38
  • Many thanks, from the distant future, for this one, @6502. Just what I was looking for. – Michael Tyson Oct 13 '22 at 00:06
0

I came across this question after I had this very idea. Skip redundant points on plots. I believe I came up with a far better and simpler solution and I'm happy to share as my first proposed solution on SO. I've coded it and it works well for me. It also takes into account the screen scale. There may be 100 points in value between those plot points, but if the user has a chart sized small, they won't see them.

So, iterating through your data/plot loop, before you draw/add your next data point, look at the next value ahead and calculate the change in screen scale (or value, but I think screen scale for the above-mentioned reason is better). Now do the same for the next value ahead (getting these values is just a matter of peeking ahead in your array/collection/list/etc adding the for next step increment (probably 1/2) to the current for value whilst in the loop). If the 2 values are the same (or perhaps very minor change, per your own preference), you can skip this one point in your chart by simply adding 'continue' in the loop, skipping adding the data point as the point lies exactly on the slope between the point before and after it.

Using this method, I reduce a chart from 963 points to 427 for example, with absolutely zero visual change.

I think you might need to perhaps read this a couple of times to understand, but it's far simpler than the other best solution mentioned here, much lighter weight, and has zero visual effect on your plot.

user946207
  • 11
  • 3
-2

I would probably apply a "least squares" algorithm to obtain a line of best fit. You can then go through your points and downfilter consecutive points that lie close to the line. You only need to plot the outliers, and the points that take the curve back to the line of best fit.

Edit: You may not need to employ "least squares"; if your input is expected to hover around "y=ax+b" as you say, then that's already your line of best fit and you can just use that. :)

Lightness Races in Orbit
  • 378,754
  • 76
  • 643
  • 1,055
  • This coud work, but how would I choose the points that define y ? The plot is constantly going up and down :-? – nc3b Jan 16 '11 at 22:11
  • The data is not a single line... but there are many parts that look linear and from which samples could be removed without changing the meaning of the chart. In other words y=mx+q only applies to sections of the data being plotted. – 6502 Jan 16 '11 at 22:59
  • @nc3b: That's what the "least squares" algorithm does for you. Look it up! – Lightness Races in Orbit Jan 17 '11 at 14:41
  • This answer makes no sense for the problem. What the OP asked was a way to simplify a graph without visible alteration of the shape, not a linear regression. LSQ fitting is a powerful technique, but not here. – 6502 Jan 17 '11 at 22:43
  • @6502: I have actually implemented solutions to this problem on numerous occasions, and they invariably take this form. To find a baseline description of a dataset, you must apply a line-of-best-fit. To find a line-of-best-fit you must consider all points. LSQ is designed for expressly this purpose. I can't see why it's not relevant. – Lightness Races in Orbit Jan 17 '11 at 22:59