1

I am having a hard time understanding the relationship between omp parallel and vector slicing within each thread.

I was trying with std::vector but switched to std:valarray to try and minimize the slice copying, although I assume it is not relevant here.

For example, there is a vector:

std::valarray<custType> myvec(dataset.size());

I have some efficient functions that take a vector as input and do some computation, so I want to maximize the chunk size and pass the sliced valarray.

The general idea is:

    std::valarray<custType> myvec(dataset.size());
#pragma omp parallel for schedule(static)
    for (int i = 0; i < dataset.size(); i++) {
        // Get the start and end index of this chunk
        std::slice indexSlice(thread_start to thread_end);
        // Assign the orginal datasets values to the newly calculated ones
        dataset[indexSlice] = vectorizedFunction(myvec[indexSlice]);
    }

When using omp parallel for, I don't understand how to get all indices at once so that I can slice the vector to send to a function (as shown in the indexSlice being thread_start to thread_end).

The other approach/idea was to make my own chunk definitions of (start,end) indices, then parallelize the for loop over those chunks, but this seems like a weird solution when there is a parallelization library already. Is there an easier/cleaner/better way to do it (like how I am trying above), or is making a set of chunk indices the right way?

001001
  • 530
  • 1
  • 4
  • 13
  • I don't think make every thread (every index) doing a `vectorizedFunction` over same slice is a good idea. – apple apple Nov 28 '22 at 18:17
  • @appleapple yes exactly , I don't want to make each datapoint passed, I want to slice the vector to be a subvector, where the index slices are the indices of the chunk defined by `omp`. – 001001 Nov 28 '22 at 18:20
  • I was looking at this answer https://stackoverflow.com/a/53073735/12392112 but as it mentions, manually doing the chunk sizes might not be good. – 001001 Nov 28 '22 at 18:22
  • it's not only chunks, even you get correct chunk, it's still iterate `chunk_size` times over every indices. – apple apple Nov 28 '22 at 18:23
  • Are you sure you need to pass slices? If you can apply an OMP `simd` directive to your function you can use an ordinary loop over the vector. If your function is not too complicated I would even inline it in the parallel loop. It sounds to me like you're making life artificially hard for yourself and you're hiding information from the compiler. – Victor Eijkhout Nov 28 '22 at 18:26
  • @VictorEijkhout its a function in a different API, so I can't really bring it into my own loop (which is an interface to the other one). – 001001 Nov 28 '22 at 18:29
  • 1
    `parallel for` is similar to a for loop except the work is automatically shared between threads. There is no way to know the start-end indices because the scheduling is not guaranteed to be a static schedule. What you need based on the provided information is to do the iteration range splitting manually using a basic `parallel` section. This is generally not a good idea to do that yourself to implement for loops since it is less flexible (one cannot adapt the schedule for example). If you do not want to change `vectorizedFunction` then this is the only possible solution. – Jérôme Richard Nov 28 '22 at 18:30
  • @appleapple sorry if the explanation is confusing, the for loop could be rewritten as `for (each chunk)`, but I don't know how to get the indices then – 001001 Nov 28 '22 at 18:31
  • 1
    @001001 I'd say maybe just loop based on the `vectorizedFunction` supported size? (i.e. define chunk size yourself) and let openmp distribute the chunks to threads. – apple apple Nov 28 '22 at 18:32
  • 3
    By the way, if the `chunkSize` is not dependent of the number of thread, then you can simply do something like: `for (int start = 0; start < dataset.size(); start+=chunkSize)` and `int end = min(start+chunkSize, dataset.size())` put in the loop. – Jérôme Richard Nov 28 '22 at 18:34
  • @JérômeRichard thank you, I think this is basically what I am after, although it should be tied to the threads, its close enough for now – 001001 Nov 28 '22 at 18:45

0 Answers0