I have a flat representation of a tree shown in the table below. The unsorted data, std::vector is:
unsorted vector
(id) (path) (fn) (line) (extra)
1 /abc/file3.c foo0 10 1
2 /abc/file3.c foo0 15 2
3 /abc/file3.c foo0 20 1
4 /abc/file3.c foo1 30 1
5 /abc/file3.c foo1 35 2
6 /abc/file3.c foo1 40 1
7 /abc/file1.c foo2 10 1
8 /abc/file1.c foo2 15 2
9 /abc/file1.c foo2 20 1
10 /abc/file3.c baz1 70 1
11 /abc/file3.c baz1 75 2
12 /abc/file3.c baz1 80 1
13 /abc/file2.c bat 10 1
14 /abc/file2.c bat 15 2
15 /abc/file2.c bat 17 2
16 /abc/file2.c bat 20 1
17 /def/file2.c baz 70 1
18 /def/file2.c baz 71 1
19 /def/file2.c baz 72 1
20 /def/file2.c baz 73 1
The columns represent 'ID', 'path', 'function', 'linenumber' and 'extra'. The data in tree form is hierarchically ordered as path->funcion->lineNumber (each path contains multiple functions, which contains multiple lines of interest (probe points)).
Each row in this table is represented with this struct:
using Type = enum class Type : unsigned {
One = 1,
Two = 2
};
using MyStruct = struct MyStruct {
unsigned id;
std::string filename;
std::string function;
unsigned lineNum;
Type type;
};
After sorting this data using the hierarchy described above (via the following comparator)
// comparator used for unique
static const auto customComp = [](const auto& lhs, const auto& rhs) {
return std::tie(lhs.filename, lhs.function, lhs.lineNum, lhs.type) <
std::tie(rhs.filename, rhs.function, rhs.lineNum, rhs.type);
};
We end up with the correctly ordered vector:
sorted vector
(id) (path) (fn) (line) (extra)
7 /abc/file1.c foo2 10 1
8 /abc/file1.c foo2 15 2
9 /abc/file1.c foo2 20 1
13 /abc/file2.c bat 10 1
14 /abc/file2.c bat 15 2
15 /abc/file2.c bat 17 2
16 /abc/file2.c bat 20 1
10 /abc/file3.c baz1 70 1
11 /abc/file3.c baz1 75 2
12 /abc/file3.c baz1 80 1
1 /abc/file3.c foo0 10 1
2 /abc/file3.c foo0 15 2
3 /abc/file3.c foo0 20 1
4 /abc/file3.c foo1 30 1
5 /abc/file3.c foo1 35 2
6 /abc/file3.c foo1 40 1
17 /def/file2.c baz 70 1
18 /def/file2.c baz 71 1
19 /def/file2.c baz 72 1
20 /def/file2.c baz 73 1
I need to parse this data using the new ranges or ranges-v3 API to efficiently recreate the tree structure from which the table originated. I specify ranges here firstly as I am learning my way through this complicated API, but also because the API seems to show a very efficient way of handling large data sets by lazy evaluation).
The following code works (which is also in godbolt), however it seems wrong. I am using a pair of nested ranges chunk_by loops to parse the data. I need to terminate the outer loop early by a break.
The main body of the code is here:
// comparator used for unique
static const auto customComp = [](const auto& lhs, const auto& rhs) {
return std::tie(lhs.filename, lhs.function, lhs.lineNum, lhs.type) <
std::tie(rhs.filename, rhs.function, rhs.lineNum, rhs.type);
};
int
main() {
print("unsorted vector", structs);
// split the sorted probes into chunks
actions::sort(structs, customComp);
const auto outerComp = [](auto&& lhs, auto&& rhs) {
return lhs.filename == rhs.filename;
};
const auto innerComp = [](auto&& lhs, auto&& rhs) {
return lhs.function == rhs.function;
};
print("sorted vector", structs);
std::cout << std::endl;
// split sorted list of probes into chunks by filename
for (const auto& sources : structs | views::chunk_by(outerComp)) {
auto foo = sources.size();
for (const auto& next : sources) {
auto outcomes = 0;
for (const auto& functions : sources | views::chunk_by(innerComp)) {
for (const auto& probe : functions) {
outcomes += (probe.type == Type::Two) ? 2 : 1;
std::cout << std::format("{}\n", probe);
}
}
std::cout << next.filename << " outcomes [" << outcomes << "]\n";
break;
}
std::cout << "\n";
}
}
Would it be possible to perform the sort and double chunking on a single for loop? I would ideally like to use the composition form of the ranges API to achieve the best result.