0

I have a thousands of line segments that I'd like to cluster by colinearity. One way to do this is to make an associative container with keys that are infinite lines. With such a container I could use a collection of line segments as values and add a line segment by determining the infinite line of which it is a segment and inserting into the corresponding bin.

Given such a set up, what is the best way to characterize the infinite lines for supporting the ability to query the data structure for line keys that are near a given line?

For example I was thinking of using an R-tree of points (Elsewhere in this project I am already using Boost.Geometry R-trees) where each point is the x-intercept and y-intercept of an infinite line. However, this only works for non-vertical and non-horizontal lines. I could handle vertical and horizontal lines as special cases but then I would not be able to easily query for lines that are "near" a vertical or horizontal line the way that I will be able to query for lines that are near a non-axis aligned line by doing a 2D range query of the intercept points in the R-tree.

I'm wondering if there is some elegant way of handling this problem. How can I represent infinite 2D lines as points such that horizontal and vertical lines are no different than any other kind of line and such that lines that are near each other map to points that are near each other?

jwezorek
  • 8,592
  • 1
  • 29
  • 46
  • With the modicum of math intuition bestowed upon me by the gods I'd say this cannot be optimized, except if e.g. it were know that all the lines were parallel or something – sehe Aug 18 '20 at 18:25
  • 2
    I see symptoms of an XY question. –  Aug 18 '20 at 19:20
  • 1
    To some extent, this could be addressed by the polar representation of the lines (direction angle + distance from the origin). Thick lines can be represented as one angle + two distances. –  Aug 18 '20 at 19:22
  • Regarding this being an XY question, if someone has an idea for a better way to group thousands of 2D line segments by approximate colinearity, i am all ears. – jwezorek Aug 19 '20 at 15:17
  • Do you mean colinearity in the statistical sense? I have a feeling there wiill be more apt formulas for this from the statistics domain than from the geometric domain. If you want to look at a specific cartesian window of the (co)domains then I would intuit that you could make a metric (like possibly integral over the the domain of the window, perhaps refined with the angle between the segment pair, e.g. `*sin(α-β)`) – sehe Aug 21 '20 at 13:23
  • Disclaimer: I did still attack this from the geometric POV, because I don't know a thing about the statistical description of co-linearity, so I'd still recommend researching that first. – sehe Aug 21 '20 at 13:24
  • yeah i mean colinear in the geometric sense. Basically what I was proposing in this question works if you use a 3D R-tree of points where each point is (rho, cos(theta), sin(theta)) where rho and theta are the distance of a line's normal from the origin and theta is the angle. I may experiment more with doing it this way, but I'm actually going in another direction with what I am actually implementing in which I do not have to find colinear segments at scale. On smaller clusters just doing whatever, doing brute force, is not bad. – jwezorek Aug 21 '20 at 14:56

1 Answers1

0

I have two solutions. The first is a simple one with some limitations:

For each infinite line, you could compute the point on the line where the perpendicular drawn from the origin meets the line. You could store the coordinates of this point as a "signature" of that line. This solution will work for all lines except those that pass through the origin. That is because when the line passes through the origin, the "signature" point will always be the origin no matter the slope of the line.

The second solution extends the first one to solve that problem: In addition to the coordinates of the point described above, you can also store the angle the normal of the line makes with the x-axis. So you'd be representing each line with an ordered triplet (x, y, theta). You can store these triplets in an rtree for 3d points and query that tree.

Two lines that pass through the origin could have a theta value of pi/4 radians and 5*pi/4 respectively. They'd be coincident, but the way they are stored in the rtree doesn't reflect that. So just for the lines that pass through the origin, you could enforce a convention, say - theta must be between 0 and pi. Such a convention would fix the problem. This convention should only be enforced for lines that pass through the origin.

Update:

Coming up with a solution that is better optimized for your use-case will require a clear definition of how you measure the "proximity" between two infinite lines.