1

This post along with some tests made it clear Eigen does not apply multiprocessing to coefficient-wise operations, such as cwiseProduct or Array multiplication, although matrix-matrix products can exploit multiple cores.

Still, with some optimization Eigen seems to be quite fast, and even if I try to write my own matrix library for particular purposes, I doubt if it will be faster than Eigen even if OpenMP for my library is enabled.

  1. Why doesn't Eigen support OpenMP when it comes to coefficient-wise operations? Is it some sort of a blunder by the developer or are there some specific reasons to avoid multiprocessing for specific operations?
  2. Can I manually include OpenMP support for such operations? The code for Eigen seems complicated so it is hard to find the exact implementation of a particular function, even through the use of Visual Studio instruments.
Kaiyakha
  • 1,463
  • 1
  • 6
  • 19

1 Answers1

1

Why doesn't Eigen support OpenMP when it comes to coefficient-wise operations? Is it some sort of a blunder by the developer or are there some specific reasons to avoid multiprocessing for specific operations?

Using multiple threads by default would not be a good idea as users can already use multiple threads in their applications and having nested parallel loops is clearly not efficient. Moreover, sharing the work for each operation can introduce a significant overhead that is not great for basic operations small/medium-sized arrays. Eigen is meant to be fast for both small and big arrays. Using OpenMP on top of Eigen is better in practice. This is especially true one Numa systems due to the first touch policy: hiding the multithreading can introduce surprising overheads due to the remote accesses or page migrations.

For complex operations like LU decomposition, this is not reasonable to ask to the user to parallelise the operation as it would need to rewrite most of the algorithm. This is why Eigen try to parallelize such algorithms.

Can I manually include OpenMP support for such operations? The code for Eigen seems complicated so it is hard to find the exact implementation of a particular function, even through the use of Visual Studio instruments.

Element-wise operation are trivial to implement in parallel in OpenMP. You can simply use a #pragma omp parallel for directive (possibly combined with a collapse clause) on a loops iterating for 0 to array.rows() (or array.cols()) itself using basic Eigen array indexing operators.


Note that while using multiple threads for basic operations seems interesting at first glance, in practice, it is often disappointing in term of speed up on most machines. Indeed, reading/writing data in the RAM is very expensive compare to applying arithmetic operations (eg. add/sub/mul) on each items as long as the operation is already vectorized using SIMD instruction. For example, on my machine 1 core can reach a throughput of 20-25 GiB/s while the maximum throughput is 40 GiB/s with 6 cores (ie. speed-up less than 2x with 6x threads).

Jérôme Richard
  • 41,678
  • 6
  • 29
  • 59
  • Disagree witht your first point. The sort of codes that do Eigen-type of computation usually do not run in a thread: they are the only thing running on a CPU. So OpenMP would actually make sense to me. If you would have said that Eigen is used in MPI applications, then, yes, extra threading makes no sense. But then, Eigen has no support for distributed memory. – Victor Eijkhout Apr 05 '22 at 01:23
  • @VictorEijkhout I think this sentence is unclear. I did not meant that there are other threads running that do some other computations but that the user can already use OpenMP at a higher-level. For example, the user can parallelise a loop (using OpenMP) computing many 2D arrays of different size (and not huge ones). Note also that in that case, parallelising each 2D-array computation is not efficient. On Numa systems, such approach should actually be slower than a sequential code. – Jérôme Richard Apr 05 '22 at 19:09
  • 1
    I think I did read you correctly. "computing many 2D arrays of different size" I think that is still less common (in the context of linear algebra problems, especially sparse) than having one large system. But it depends on the application. You probably come from a different background than I. – Victor Eijkhout Apr 05 '22 at 20:37