0

I'm working on a project to add AMD blis to a product that currently uses MKL and intel omp.

Whilst I see some testcases showing improvement, there are some that are a lot worse.

After profiling I see the AMD version spending more time in gomp barrier and pthread functions than the Intel version spends in iomp kmp functions.

I don't have much experience with OMP. I was wondering where the build options used for OMP might have much impact. This is with a locally build GCC 11.2 which uses

GNU C17 11.2.0 -mtune=generic -march=x86-64 -g -O2 -ftls-model=initial-exec

Does gomp have any march optimizations to speed up barriers?

Paul Floyd
  • 5,530
  • 5
  • 29
  • 43

0 Answers0