Why the Intel oneAPI tbb::blocked_range3d run code many times have different value?

Question

I run this example many time, every time blocked_range3d has different value? I test this because I make my blocked_range2d+for to blocked_range3d, and it make a lot error vaule. Is it my code have some bug? Can someone help me findout? I alse post this issue in GitHub

OS:Windows 11 64bit
oneTBB: 2021.6.0 lib\intel64\vc14
Dev env: VS2022 Release&Debug mode

add more info:
Mainly I need the values of i, j, k to calculate the coordinates of the matrix, once I change to blocked_range3d, the matrix like: a[i]+b[j]+c[k] don't act as blocked_range2d+for, it seems wired.

It's also wired that it must be blocked_range2d+for, not the for+blocked_range2d, this will also make error value.
Here is the Output:

Round:1
d3p size:7667
d3r size:26231
d3c size:91000

d2p size:700
d2r size:700
d2c size:91000

Round:2
d3p size:7000
d3r size:25559
d3c size:91000

d2p size:700
d2r size:700
d2c size:91000

Here is the code:

#include <oneapi/tbb.h>

#include <iostream>

using namespace oneapi::tbb;

int main() {
    {
        //blocked_range3d only
        concurrent_vector<size_t> d3p{};
        concurrent_vector<size_t> d3r{};
        concurrent_vector<size_t> d3c{};

        parallel_for(blocked_range3d<size_t>(0, 7, 0, 100, 0, 130), [&](blocked_range3d<size_t>& r) {
            for (size_t i = r.pages().begin(); i < r.pages().end(); ++i) {
                d3p.push_back(i);
                for (size_t j = r.rows().begin(); j < r.rows().end(); ++j) {
                    d3r.push_back(j);
                    for (size_t k = r.cols().begin(); k < r.cols().end(); ++k) {
                        d3c.push_back(k);
                    }
                }
            }
        });

        std::cout << "d3p size:" << d3p.size() << std::endl;
        std::cout << "d3r size:" << d3r.size() << std::endl;
        std::cout << "d3c size:" << d3c.size() << std::endl;
    }

    std::cout << std::endl;

    {
        //blocked_range2d + for
        concurrent_vector<size_t> d2p{};
        concurrent_vector<size_t> d2r{};
        concurrent_vector<size_t> d2c{};

        parallel_for(blocked_range2d<size_t>(0, 7, 0, 100), [&](blocked_range2d<size_t>& r) {
            for (size_t i = r.rows().begin(); i < r.rows().end(); ++i) {
                d2p.push_back(i);
                for (size_t j = r.cols().begin(); j < r.cols().end(); ++j) {
                    d2r.push_back(j);
                    for (size_t k = 0; k < 130; ++k) {
                        d2c.push_back(k);
                    }
                }
            }
        });

        std::cout << "d2p size:" << d2p.size() << std::endl;
        std::cout << "d2r size:" << d2r.size() << std::endl;
        std::cout << "d2c size:" << d2c.size() << std::endl;
    }

    return 0;
}

score 0 · Answer 1 · answered Oct 11 '22 at 11:31

You are observing different values for each run due to auto-partitioner. As the auto-partitioner attempts to minimize range splitting while providing opportunities for work stealing.

Ranges may differ due to splitting being non-deterministic.

Refer to the below link for more details: https://spec.oneapi.io/versions/latest/elements/oneTBB/source/algorithms/partitioners/auto_partitioner.html

Why the Intel oneAPI tbb::blocked_range3d run code many times have different value?

1 Answers1