0

I am planning to use boost::icl::interval_map in a piece of code in which speed matters and will be both adding and removing intervals mapped to values from the map an approximately equal number of times.

Bascially I need a container data structure for which I can define +=, -=, and == operators that perform as efficiently is possible. If I use an std::unordered_set as the value container all three operators will be O(n) but am wondering if I can do better by using some kind of tree (maybe there is something in boost) -- although std::set seems like it would be worse: O(n log n) operations believe.

Below is sample code that demonstrates how interval_maps work. "character_set" is a dumb class that supports all the operations needed to behave as the value of a map. It just uses std::strings as a bucket of characters for demonstration purposes. (Again, the following code is for tutorial purposes for people who are unfamiliar with how a boost::icl::interval_map behaves; it has nothing to do with my actual code; please don't post that character_set below is inefficient)

#include <iostream>
#include <string>
#include <boost/icl/interval_map.hpp>
#include <boost/algorithm/string/replace.hpp>

class character_set
{
private:
    std::string str_;
public:

    character_set(const std::string& str = {}) : str_(str)
    {}

    void operator+=(character_set ss)
    {
        str_ += ss.str_;
    }

    void operator-=(character_set ss)
    {
        boost::algorithm::replace_all(str_, ss.str_, "");
    }

    std::string str() const
    {
        return str_;
    }
};

bool operator==(character_set a, character_set b) {
    return a.str() == b.str();
}

std::ostream& operator<<(std::ostream& os, const character_set& ss)
{
    os << ss.str();
    return os;
}

using interval_map = boost::icl::interval_map<int, character_set>;
using interval = interval_map::interval_type;

void displayIntervals(const interval_map& intervals)
{
    for (const auto& i : intervals) {
        std::cout << i.first << " , " << i.second << std::endl;
    }
    std::cout << std::endl;
}

int main()
{
    interval_map intervals;

    std::cout << "-- before --" << std::endl;
    intervals += std::make_pair(interval::closed(5, 10), character_set("A"));
    intervals += std::make_pair(interval::closed(15, 20), character_set("B"));
    displayIntervals( intervals );

    std::cout << "-- during --" << std::endl;
    intervals += std::make_pair(interval::closed(8, 18), character_set("C"));
    displayIntervals(intervals);

    std::cout << "-- after --" << std::endl;
    intervals -= std::make_pair(interval::closed(8, 18), character_set("C"));
    displayIntervals(intervals);

}     

output of the above is as follows:

-- before --
[5,10] , A
[15,20] , B

-- during --
[5,8) , A
[8,10] , AC
(10,15) , C
[15,18] , BC
(18,20] , B

-- after --
[5,10] , A
[15,20] , B
jwezorek
  • 8,592
  • 1
  • 29
  • 46
  • How big of a value set has your container to support? Is it really about theoretical complexity or about real world speed? – n314159 Dec 13 '19 at 00:20
  • Real world speed. I’m using this data structure as part of a solution To the problem of given a set of line segments finding all line segments that are parellel to at least one other line segments within a distance of k. It’s a sweep line algorithm where you reduce the parallel line seg problem to finding overlapping axis aligned rectangles. There will on the order of 10,000 line segments but the interval map will only ever have in it is O(number of parallel line segments) which I’m guessing will be in the dozens or low hundreds but I’m not sure – jwezorek Dec 13 '19 at 00:32
  • 1
    Hm, the 10,000 is probably a bit high for what I had in mind (and especially the difference between total and active objects makes my approach bad). I thought about using something like a bitmask (in form of a `vector` that as a size of `number_possible_values/64`). This would have linear time operations all around (since it would be `size()` many integer comparisons, bitwise ors and bitwise nots) but the constant factor would be pretty low. The problem with this is, that these operations are all linear in the total number segments not just in the active number. – n314159 Dec 13 '19 at 00:48

0 Answers0