The following is a variation on the Ramer-Douglas-Peucker algorithm for 1.5d graphs:
- Compute the line equation between first and last point
- Check all other points to find what is the most distant from the line
- If the worst point is below the tolerance you want then output a single segment
- Otherwise call recursively considering two sub-arrays, using the worst point as splitter
In python this could be
def simplify(pts, eps):
if len(pts) < 3:
return pts
x0, y0 = pts[0]
x1, y1 = pts[-1]
m = float(y1 - y0) / float(x1 - x0)
q = y0 - m*x0
worst_err = -1
worst_index = -1
for i in xrange(1, len(pts) - 1):
x, y = pts[i]
err = abs(m*x + q - y)
if err > worst_err:
worst_err = err
worst_index = i
if worst_err < eps:
return [(x0, y0), (x1, y1)]
else:
first = simplify(pts[:worst_index+1], eps)
second = simplify(pts[worst_index:], eps)
return first + second[1:]
print simplify([(0,0), (10,10), (20,20), (30,30), (50,0)], 0.1)
The output is [(0, 0), (30, 30), (50, 0)]
.
About python syntax for arrays that may be non obvious:
x[a:b]
is the part of array from index a
up to index b
(excluded)
x[n:]
is the array made using elements of x
from index n
to the end
x[:n]
is the array made using first n
elements of x
a+b
when a
and b
are arrays means concatenation
x[-1]
is the last element of an array
An example of the results of running this implementation on a graph with 100,000 points with increasing values of eps
can be seen here.