0

edit: Hierarchy doesn't play nice with my goals here. Leaving original request, but I answered what fulfills the core request (overlap/non-overlap rules) below.

Assume there is some set of 1D lines, each described by 2 unsigned ints: start and end, with start < end. I want to create groups based on if they overlap, but I don't want groups to contain any lines that don't overlap. In the case of a line being in multiple groups, I guess I'll need some sort of hierarchy structure to track groups in groups in groups...

Here's the rules:

  • Lines that overlap must be grouped together as low on the hierarchy as possible.
  • Lines that don't overlap can't be in the same group.

Anyway, here's an example picture:

lines

From a quick look, I can say that Line A and Line C form Group 0, Line H and Line I form Group 1, and Line B is Group 2. Everything else is overlapping groups, with Line D being in Group 1 and Group 2, Line E in Group 0 and Group 1, and Line F and Line G are in all three of those groups. So there's two layers of grouping here, but I'm pretty sure there could be N depending on the complexity of the problem. And I'm also pretty sure there's a few catches to this that my example isn't representing.

What is the typical algorithm for handling this?

original_username
  • 2,398
  • 20
  • 24
user173342
  • 1,820
  • 1
  • 19
  • 45
  • It's not clear what the problem is. Are you trying to maximize the number of groups ? Can a group contain just one interval ? – krjampani Oct 04 '12 at 16:02
  • A group can only have one interval, and beyond that the goal is to group overlapping lines as low on the hierarchy as possible, along with no group containing any pair of non-overlapping lines in its set. I think that those two rules of overlap and non-overlap enforce a single interval per group. I also think they are sufficient to define a single result... But I'm not sure. – user173342 Oct 04 '12 at 16:05
  • It doesn't really make sense to say that you want them to be as low on the hierarchy as possible because if you have N groups, any permutation of them minimizes the sum of distances from the first position in the hierarchy. – argentage Oct 04 '12 at 16:11

2 Answers2

1

I realized that my two rules of overlapping and non-overlapping were incompatible with a hierarchical structure. Here's the algorithm I settled on, with pretty pictures to explain.

lines reforged

The basic idea ended up boiling down to finding "overlap maximas". The lines need to be sorted by start ascending, then you loop through from left to right. You start in a "closed" state, which will mean that there isn't an "open" group waiting to be closed out. Lines that are currently being iterated through are considered "active lines".

Rule 1: If in a closed state and a line start is encountered, switch to an open state. The points where this occurs are indicated with green dots. If multiple lines were opening a group during the same step, I gave them all with green dots since it is unknown which would be encountered first.

Continue collecting lines until an end is reached.

Rule 2: If in an open state and an active line is ending, all active lines (including any ending active lines) are placed into a new group. The algorithm switches to a closed state. The steps when groups are formed are highlighted in blue. The line ends triggering this are indicated by red dots. The rule for multiple dots in one step are the same as before.

For instance, the first group formed is Line A, Line C, Line F, and Line G. The above two rules are sufficient to process the example.

There is a final rule, which doesn't come into play during this example.

Rule 3: If the algorithm ends iteration in an open state, all active lines are placed into a new group.

Here's the 4 groups that result from this algorithm:

line groups

user173342
  • 1,820
  • 1
  • 19
  • 45
0

I'd say some variant of hierarchical clustering would fit the bill. The similarity function could be the degree of overlap between two line segments.

Fred Foo
  • 355,277
  • 75
  • 744
  • 836
  • Well, degree of overlap is inconsequential. They either overlap or not, so a boolean and nothing more. Would the algorithm work in that case? – user173342 Oct 04 '12 at 15:54
  • @user173342: you said you wanted some kind of hierarchy structure, and that's what these algorithms give you. If you want to keep things flat, then a simple [union-find](https://en.wikipedia.org/wiki/Disjoint-set_data_structure) algorithm would do the trick, but it would also cluster G, H and I together. – Fred Foo Oct 04 '12 at 15:59