2

Background

I have worked with C#.Net + LINQ wherever possible and trying my hand at C++ development for a project I am involved. Of course, I fully realize that C# and C++ are two different worlds.

Question

I have an std::list<T> where T is a struct as follows:

struct SomeStruct{
    int id;
    int rate;
    int value;
};

I need to get a result of group by rate and sum of value. How can I perform GroupBy Sum aggregate function on this list?

Example:

SomeStruct s1;
SomeStruct s2;
SomeStruct s3;

s1.id=1;
s1.rate=5;
s1.value=100;

s2.id=2;
s2.rate=10;
s2.value=50;

s3.id=3;
s3.rate=10;
s3.value=200;

std::list<SomeStruct> myList;
myList.push_front(s1);
myList.push_front(s2);
myList.push_front(s3);

With these inputs I would like to get following output:

rate|value
----|-----
   5|  100
  10|  250

I found a few promising libs such as CINQ and cppitertools. But I couldn't fully understand as I lack sufficient knowledge. It would be great if someone guide me to right direction, I am more than willing to learn new things.

einpoklum
  • 118,144
  • 57
  • 340
  • 684
raidensan
  • 1,099
  • 13
  • 31

1 Answers1

2

Computing a Group-By sum is relatively straightforward:

using sum_type = int; // but maybe you want a larger type
auto num_groups = max_rate + 1;
std::vector<sum_type> rate_sums(num_groups); // this is initialized to 0
for(const auto& s : myList) {
    rate_sums[s.rate] += s.value;
}

this is when the rate values are within 0 and max_rate, and max_rate is not too large relative to myList.size(); otherwise the memory use might be excessive (and you'll have some overhead initializing the vector).

If the rate values are scattered over a large range relative to myList.size(), consider using an std::unoredered_map instead of an std::vector).

The code above can also be parallelized. The way to parallelize it depends on your hardware, and there are all sorts of libraries to help you do this. In C++20 there might be language facilities for parallelization.

Remember, though, that linked lists are rather slow to work with, because you have to dereference an arbitrary address to get from one element to the next. If you can get your input in an std::vector or a plain array, that would be faster; and if you can't, it's probably worthless to bother with parallelization.

einpoklum
  • 118,144
  • 57
  • 340
  • 684