Questions tagged [openmp]

OpenMP is a cross-platform multi-threading API which allows fine-grained task parallelization and synchronization using special compiler directives.

OpenMP is a cross-platform multi-threading API which allows fine-grained task parallelization and synchronization using special compiler directives. OpenMP offers easy access to multi-threading without requiring knowledge of system-dependent details. At the same time, it is reasonably efficient compared to fine-tuned implementations with the bonus of being easiest to write multi-threads code. Forums and complete information on OpenMP is at https://openmp.org/.

OpenMP is based on multi-thread model, and offers Shared Memory parallelism and heterogeneous programming for coprocessors through compiler directives, library routines and environment variables. It is restricted to C/C++ and Fortran applications, however provides portability across different Shared Memory architectures.

It is through directives, added by the programmer to the code, that the compiler adds parallelism in the application. OpenMP can be used in single or multi-cores machines, in the first architecture the compiler directives are ignored, thus the application is executed in a sequential manner, promoting portability between the two architectures.

Latest version is 5.2 (November 2021): Official OpenMP specifications.

Definitive Book Guide

Helpful links

6462 questions
2
votes
0 answers

OpenMP taskgroup issue with dependencies

I have a larger program working with tasks and dependencies and I would like to use the taskgroup construct. But, I found an issue working with them so I wrote the following simple code example. #include #include #include…
pierre
  • 21
  • 1
2
votes
1 answer

Standalone loop collapsing in OpenMP

I have a nested loop that looks as follows - for(int i=0; i
Atharva Dubey
  • 832
  • 1
  • 8
  • 25
2
votes
1 answer

In OpenMP how to specify task dependencies amongst functions invoked by different classes? Are global variables the only solution?

Lets say we have two classes: class A { void run_all() { #pragma omp task f1a(); #pragma omp task f1b(); } void f1a() { /*some code*/ } void f1b() { /*some code*/ } } class B { void…
astrophobia
  • 839
  • 6
  • 13
2
votes
1 answer

MKL FFT performance on Intel Xeon 6248 - abrupt variations

I am working on an application which requires to Fourier Transform batches of 2-dimensional signals, stored using single-precision complex floats. I wanted to test the idea of dissecting those signals into smaller ones and see whether I can improve…
2
votes
1 answer

In OpenMP task dependencies does the dependency clause parameter need to point to an actual variable?

Consider this code: include int main() { int x = 100; #pragma omp parallel { #pragma omp single { #pragma omp task depend (in: x) { x += 1; } #pragma omp task depend (out:…
astrophobia
  • 839
  • 6
  • 13
2
votes
1 answer

Constructing distance matrix in parallel in C++11 using OpenMP

I would like to construct a distance matrix in parallel in C++11 using OpenMP. I read various documentations, introductions, examples etc. Yet, I still have a few questions. To facilitate answering this post, I state my questions as assumptions…
Chr
  • 1,017
  • 1
  • 8
  • 29
2
votes
1 answer

Make a reduction with OpenMP to compute the final summed value of an element of matrix

I have the following double loop where I compute the element of matrix Fisher_M[FX][FY]. I tried to optimize it by putting an OMP pragma #pragma omp parallel for schedule(dynamic, num_threads), but the gain is not as good as expected. Is there a way…
user1773603
2
votes
1 answer

Problem using pragma omp parallel for to compute pi

I have written the following code to compute the value of pi and it works: #include #include static long num_steps = 1000000; double step; #define NUM_THREADS 16 int main() { int i, nthreads; double tdata, pi,…
anotherone
  • 680
  • 1
  • 8
  • 23
2
votes
1 answer

OpenMP-behavior: Using ICC and GCC gives significantly different run times

For a small benchmark of OpenMP on an i7-6700K I wrote the following code: #include #include #include #include constexpr int bench_rounds = 32; int main(void) { using std::chrono::high_resolution_clock; …
arc_lupus
  • 3,942
  • 5
  • 45
  • 81
2
votes
1 answer

Intel VTune Profiler shows __mulq is a computationally expensive function in a fortran code

I'm trying to perform an audit on a rather complicated multi-physics model I'm working on and have been using Intel VTune Profiler to identify expensive subroutines. The most expensive function is a function called __mulq which is not something…
2
votes
1 answer

Impact of calling external C function on CPU time in OpenModelica

we have implemented an external delay function in C and we want to recall it in our Modelica model (transmission line). our goal is to accelerate the CPU time. unfortunately, it increased the CPU time. My questions are: Does calling the external…
2
votes
1 answer

GPU array addition using OpenMP

I am trying out OpenMP offloading with an nvidia GPU and I am trying to do some array calculations with it in C++. Right now my output is not desirable, as I am new with offloading calculations with OpenMP. Would appreciate if someone can point me…
2
votes
0 answers

Modules in Fortran OpenMP declared default shared

I want to declare all the variables in a Fortran MODULE as SHARED in an OpenMP parallel region. I can do this for COMMON blocks, e.g., SHARED(/commonname/). Is there a way to do this for modules?
Hank Happ
  • 21
  • 2
2
votes
1 answer

Is it safe to use omp_get_thread_num to index a global vector?

I have a code like this: thread_local CustomAllocator* ts_alloc = nullptr; struct AllocatorSetup { AllocatorSetup( int threadNum ) { static std::vector vec( (size_t)omp_get_max_threads() ); ts_alloc =…
Soonts
  • 20,079
  • 9
  • 57
  • 130
2
votes
3 answers

Performance issue while using C/OpenMP

I wrote some code to test executing time of small program using C and OpenMP and I encountered some issue with app execution time. Here is pice of code, responsible to add 2 vectors: float *x_f = (float *)malloc(sizeof(float) * DATA_SIZE); float…
Bogus
  • 23
  • 5