c# most efficient way to extract non-overlapping sorted ranges

Question

In order to highlight multiple search text elements in a single string, I have a routine computing the ranges to be highlighted within the string itself.

For example, if I search his+is string in string "this is my misk test" I get higlighted ranges [1,3], [2,3], [5,6], and [12,13].

So my desired result here would be [1,3], [5,6], and [12,13].

Is there a general way to extract non-overlapping ranges from the above list? Or event better, is there a string-specific way to get those?

Souldn't the desired result be `[1,3]`, `[5,6]` and `[12,13]`? — InBetween, Mar 30 '17 at 10:52

score 1 · Answer 1 · answered Mar 30 '17 at 11:00

1

Sort the ranges by start-index. (Your procedure most likely already does this)
Select the first range
Skip all ranges that start before the currently selected range ends (keep checking the next until you find a range that starts after the currently selected range ends)
Select the new range
Goto 3

If you want to do it text based, it depends on the complexity of your possible search patterns (regexes?). If you specify this, I'd be happy to try to help you out.

answered Mar 30 '17 at 11:00

Soronbe

906
5
12

For regex, two issues I am concerned with here: 1) index of should be case-insensitive and accent-insensitive 2) input as well as search texts might be pretty much anything... – neggenbe Mar 30 '17 at 11:22

InBetween · Answer 2 · 2017-03-30T14:25:07.110

What do you understand by efficient? Fast? Least memory usage? Mantainable code?

There are tons of ways you could solve this, some of them can entail seemingly a lot of code but maybe thats not bad. For example, consider the following approach:

public struct Interval<T> where T: IComparable<T>
{
    public T LowerBound { get; }
    public T UpperBound { get; }

    public Interval(T lowerBound, T upperBound)
    {
        Debug.Assert(upperBound.CompareTo(lowerBound) > 0);
        LowerBound = lowerBound;
        UpperBound = upperBound;
    }

    public static bool AreOverlapping(Interval<T> first, Interval<T> second) => 
        first.UpperBound.CompareTo(second.LowerBound) > 0 &&
        second.UpperBound.CompareTo(first.LowerBound) > 0;

    public static Interval<T> Union(Interval<T> first, Interval<T> second)
    {
        Debug.Assert(AreOverlapping(first, second));
        return new Interval<T>(Min(first.LowerBound, second.LowerBound),
                               Max(first.UpperBound, second.UpperBound));
    }

    public override string ToString() => $"[{LowerBound}, {UpperBound}]";

    private static T Min(T t1, T t2)
    {
        if (t1.CompareTo(t2) <= 0)
            return t1;

        return t2;
    }

    private static T Max(T t1, T t2)
    {
        if (t1.CompareTo(t2) >= 0)
            return t1;

        return t2;
    }
}

And now, our method to extract non-overlapping intervals would be:

public static IEnumerable<Interval<T>> GetOverlappingIntervals<T>(this IEnumerable<Interval<T>> intervals)
    where T : IComparable<T>
{
    var stack = new Stack<Interval<T>>();

    foreach (var interval in intervals.OrderBy(i => i.LowerBound))
    {
        if (stack.Count == 0)
        {
            stack.Push(interval);
        }
        else
        {
            var previous = stack.Peek();

            if (Interval<T>.AreOverlapping(interval, previous))
            {
                stack.Pop();
                stack.Push(Interval<T>.Union(interval, previous));
            }
            else
            {
                stack.Push(interval);
            }
        }
    }

    return stack;
}

Note that this solution will not perform the union of adjacent intervals, not sure if this is what you want.

Is this solution mantainable? Yes, the code is pretty much self explanatory. Is it the most efficient? Well, probably not, but who cares if its "efficient" enough and it meets your performance goals. If it doesn't, then do start optimizing it.

c# most efficient way to extract non-overlapping sorted ranges

2 Answers2