Questions tagged [openmp]

OpenMP is a cross-platform multi-threading API which allows fine-grained task parallelization and synchronization using special compiler directives.

OpenMP is a cross-platform multi-threading API which allows fine-grained task parallelization and synchronization using special compiler directives. OpenMP offers easy access to multi-threading without requiring knowledge of system-dependent details. At the same time, it is reasonably efficient compared to fine-tuned implementations with the bonus of being easiest to write multi-threads code. Forums and complete information on OpenMP is at https://openmp.org/.

OpenMP is based on multi-thread model, and offers Shared Memory parallelism and heterogeneous programming for coprocessors through compiler directives, library routines and environment variables. It is restricted to C/C++ and Fortran applications, however provides portability across different Shared Memory architectures.

It is through directives, added by the programmer to the code, that the compiler adds parallelism in the application. OpenMP can be used in single or multi-cores machines, in the first architecture the compiler directives are ignored, thus the application is executed in a sequential manner, promoting portability between the two architectures.

Latest version is 5.2 (November 2021): Official OpenMP specifications.

Definitive Book Guide

Helpful links

6462 questions
2
votes
1 answer

Numbers not randomized after runs

I'm trying to create an openMP program that randomizes double arrays and run the values through the formula: y[i] = (a[i] * b[i]) + c[i] + (d[i] * e[i]) + (f[i] / 2); If I run the program multiple times I've realised that the Y[] values are the same…
Ibrahim
  • 27
  • 5
2
votes
0 answers

Templating and OpenMP causes free(): double free detected in tcache 2

I've worked for a while to get my code to a minimal reproducible example and I think I have it. See the single main.cpp function below, compiled (on Linux) one of two ways: In serial: g++ -O3 --std=c++17 -o test_rho.exe main.cpp With OpenMP: g++…
drjrm3
  • 4,474
  • 10
  • 53
  • 91
2
votes
1 answer

OpenMP parallel loop much slower than regular loop

The whole program has been shrunk to a simple test: const int loops = 1e10; int j[4] = { 1, 2, 3, 4 }; time_t time = std::time(nullptr); for (int i = 0; i < loops; i++) j[i % 4] += 2; std::cout << std::time(nullptr) - time <<…
Kaiyakha
  • 1,463
  • 1
  • 6
  • 19
2
votes
1 answer

Faulty benchmark, puzzling assembly

Assembly novice here. I've written a benchmark to measure the floating-point performance of a machine in computing a transposed matrix-tensor product. Given my machine with 32GiB RAM (bandwidth ~37GiB/s) and Intel(R) Core(TM) i5-8400 CPU @ 2.80GHz…
2
votes
1 answer

First touch in case of small sized data sharing on Linux

The "first touch" (a special term used to indicate virtual memory mapping in case of NUMA systems) write-operation causes the mapping of memory pages to the NUMA node associated with the thread which first writes to them. Having read this page,…
2
votes
1 answer

How do OpenMP thread ids work with recursion?

Here is a simple recursive program that splits into two for every recursive call. As expected, the result is 2 + 4 + 8 calls to rec, but the number of threads is always the same: two, and the ids bounce back and forth between 0 and one. I expected…
2
votes
2 answers

Increasing array index in openMP

I am new to using OpenMP. I am trying to parallelize a nested loop, and so far I have something of this form... #pragma omp parallel for for (j=0;j
S2022
  • 35
  • 3
2
votes
3 answers

Problem of sorting OpenMP threads into NUMA nodes by experiment

I'm attempting to create a std::vector> with one set for each NUMA-node, containing the thread-ids obtained using omp_get_thread_num(). Topo: Idea: Create data which is larger than L3 cache, set first touch using thread 0, perform…
Nitin Malapally
  • 534
  • 2
  • 10
2
votes
3 answers

How to optimize omp parallelization when batching

I am generating class Objects and putting them into std::vector. Before adding, I need to check if they intersect with the already generated objects. As I plan to have millions of them, I need to parallelize this function as it takes a lot of time…
2
votes
1 answer

Clang + OpenMP inefficient loop invariants

I came across some inefficient code generation by Clang while answering a different question (How do i parallelize this code using openmp with reduction) Let's consider this simple code: void scale(float* inout, ptrdiff_t n, ptrdiff_t m, ptrdiff_t…
Homer512
  • 9,144
  • 2
  • 8
  • 25
2
votes
1 answer

Installing OpenMP on Mac m1. 'clang: error: unsupported option '-fopenmp'' when running a setup.py

I am using the Macbook pro M1. I have a python package that I am trying to install which is compiling c files and has the setup.py file as sources = ['*.c'], include_dirs=['##Directory Name##'], …
2
votes
1 answer

How can I realize data local spawning or scheduling of tasks in OpenMP on NUMA CPUs?

I have this simple self-contained example of a very rudimentary 2 dimensional stencil application using OpenMP tasks on dynamic arrays to represent an issue that I am having on a problem that is less of a toy problem. There are 2 update steps in…
user151387
  • 103
  • 7
2
votes
0 answers

OpenMP + Fortran on Apple M1 is slower than MPI+Fortran

I have a new MacBook pro with the Apple M1 Max processor (10 cores total), running OS 12.2.1. I used Homebrew to install gcc: ~/homebrew/bin/gcc-11 --version gcc-11 (Homebrew GCC 11.2.0_3) 11.2.0 Copyright (C) 2021 Free Software Foundation,…
2
votes
1 answer

Question to ARB on target construct limitations

in a research project we are developing a special-purpose floating-point accelerator. In this context, our original vision was to have a kind of "two-stage" or "nested" offload from ARM host -> RISCV-managed accelerator cluster -> actual…
2
votes
0 answers

Eigen matrix multiply triggers code to link to libomp.dll; it is already linked to libiomp5md.lib/dll. How do I stop this spurious linkage

A short complex piece of code is built with cmake; it links to Lapack(MKL), gmp/mpir, boost,.. and eigen3 (the versions are from vcpckg and the version of eigen is eigen3:x64-windows 3.3.9#1 ). I am currently testing it on Visual Studio 2019. At…
tjl
  • 131
  • 1
  • 5