0

I call a library function that accepts pointer to std::set and processes elements of it.

However, it processes only certain number of elements (let's say 100) and if the set has more elements it simply throws an exception. However, I receive set of much larger size. So I need efficient way to get subset of std::set.

Currently, I am copying 100 elements to temporary set and pass it to the function.

struct MyClass
{
    // Class having considerably large size instance
};

// Library function that processes set having only 100 elements at a time
void ProcessSet (std::set<MyClass>* ptrMyClassObjectsSet);

void FunctionToProcessLargeSet (std::set<MyClass>& MyClassObjSet)
{
    std::set<MyClass> MyClass100ObjSet;

    // Cannot pass MyClassObject as it is to ProcessSet as it might have large number of elements
    // So create set of 100 elements and pass it to the function
    std::set<MyClass>::iterator it;
    for (it = MyClassObjSet.begin(); it != MyClassObjSet.end(); ++it)
    {
        MyClass100ObjSet.insert (*it);

        if (MyClass100ObjSet.size() == 100)
        {
            ProcessSet (&MyClass100ObjSet);
            MyClass100ObjSet.clear();
        }
    }

    // Prrocess remaining elments
    ProcessSet (&MyClass100ObjSet);
    MyClass100ObjSet.clear();
}

But it's impacting performance. Can anyone please suggest better ways to do this?

Atul
  • 3,778
  • 5
  • 47
  • 87
  • 2
    Can you show a [mcve] of what you are currently doing? – NathanOliver Aug 27 '15 at 02:46
  • If the library requires a series of small `std::set` objects, then that is what you must create. The fact that they are subsets of something else doesn't seem to be helpful. Does the library offer no other alternative? Are they heap-allocated objects like `std::string`? – Potatoswatter Aug 27 '15 at 03:25
  • @Potatoswatter: Yes objects are heap allocated. I was thinking if it was array, I would have passed "sub part of original array just by specifying array indices" but in case of std::set I cannot do that (Or is there any way to do that?) – Atul Aug 27 '15 at 03:46

2 Answers2

1

Well, that sounds like a bad library design, but if you have to work with what you have then:

  • If library can accept a pair of iterators - that's the easy way to go using std::advance
  • If it's templated and can accept std::set<T>, then copying a part of your set to std::set<std::reference_wrapper<T>> might perform better if copying T is slow (see here to see that no copies are created)
  • If it only acceptsstd::set<ParticularObjectType>, I don't see a way around copying the data.

Hope this helps,

Rostislav.

Rostislav
  • 3,857
  • 18
  • 30
1

Since it looks like you are locked into having to use a subset. I have tweaked your code a little and I think it might be faster for you. It is still an O(n) operation but there is no branching in the for loop which should increase performance.

void FunctionToProcessLargeSet(std::set<MyClass>& MyClassObjSet)
{
    int iteration = MyClassOgjSet.size() / 100; // get number of times we have collection of 100
    auto it = MyClassObjSet.begin();
    auto end = MyClassObjSet.begin();
    for (; iteration == 0; --iteration)
    {
        std::advance(end, 100); // move end 100 away
        std::set<MyClass> MyClass100ObjSet(it, std::advance(it, end));  // construct with iterator range
        std::advance(it, 100); // advace it to end pos
        ProcessSet(&MyClass100ObjSet); // process subset
    }
    if (MyClassOgjSet.size() % 100 != 0)  // get last subset
    {
        std::set<MyClass> MyClass100ObjSet(it, MyClassObjSet.end());
        // Prrocess remaining elments
        ProcessSet(&MyClass100ObjSet);
    }
}

Let me know if this runs faster for you.

NathanOliver
  • 171,901
  • 28
  • 288
  • 402