2

I use gfortran for years but quite new to nvfortran. I would like to ask if anyone can give me recommendation for nvfortran compiler useful flags for both debug and build modes?

what I know for debug mode are:

-C -g -Mbounds -traceback

and for build mode (with optimizations) are:

-O3 -Mconcur
imronuke
  • 21
  • 2
  • This is somewhat open-ended and opinion-based. More objective would be asking for equivalents of flags you know from other compilers or for flags that do what you want it to do. – Vladimir F Героям слава Mar 28 '22 at 06:47
  • I mean, even in gfortran one uses different sets of flags depending on the exact task. Debugging flags are simpler, but you have `fcheck` and `fsanitize`. For optimizing you have a vast amount of combinations of flags and testing is often necessary to find the best combination. `O2` or O3`? Or even `O5`? Fast math or not? Unroll loops or not? How much? – Vladimir F Героям слава Mar 28 '22 at 06:50
  • Do we have a community wiki answer for recommended compiler flags? – Ian Bush Mar 28 '22 at 08:42
  • @VladimirFГероямслава: yes you are right, but my intention was for typical general problems. However, I also would appreciate if someone willing to explain the flags for the specific problems based on on his/her experiences in using nvfortran. – imronuke Mar 28 '22 at 23:35

1 Answers1

2

We generally recommend using "-fast", "-O3", or "-fast -O3" for general performance. "-Mconcur" enables auto-parallelization which may or may not help. In general it's better to use explicit parallelization via OpenACC or OpenMP directives, or Fortran "DO CONCURRENT".

Other potentially useful optimization flags:

-Mnouniform - Allow non-uniform computation of SIMD and scalar code. Faster, but may reduce some accuracy.

-Mstack_arrays - Allocate automatic arrays on the stack rather than the heap. Faster but uses more stack. You may need to increase the program's stack in your shell environment.

-Bstatic-nvidia - Link the compiler runtime libraries statically rather than dynamic.

-Mfprelaxed - Allow use of faster but reduced precision intrinsics and floating-point computations.

-mp[=gpu] - Enable OpenMP directives and optionally enable target offload to GPUs.

-acc[=multicore] - Enable OpenACC directives, defaults to offload to GPUs, use "multicore" to target multicore CPUs.

-stdpar[=gpu] - Enable parallelization of DO CONCURRENT to host or GPU.

The debugging flags are fine, though "-C" and "-Mbounds" both enable bounds checking so only one is needed.

Another useful flag to use during development is "-Minfo". The compiler will give feedback messages on what optimization it's applying or not able to apply. It can be a lot of messages, so you can use sub-options to limit the output to particular types such as "-Minfo=vect" to see which loop are or are not getting vectorized. See "nvfortran -help -Minfo" for the full list of sub-options.

Mat Colgrove
  • 5,441
  • 1
  • 10
  • 11
  • Thanks. I tried to use -O3 combined iwth -fast for my code (private that I cannot share). But for unknown reason, if used -fast, it got stuck (take forever to build) in one of the module without any warning nor error message. It works well if I used only -O3. – imronuke Mar 28 '22 at 23:40
  • Sorry about that. Obviously a compiler issue, but I'd need a reproducer in order to report it. Though we can try disabling some of the sub-options to see if gets you past this. Run the command "nvfortran -help -fast" to see the flags that make up -fast, then add things like "-Mnovect", "-Mnounroll", "-Mnolre", or "-Mnopre", to disable the sub-options to see which optimization is causing the poor compilation time. There's also "-time" which will show the compilation time for the various compilation phases. – Mat Colgrove Mar 29 '22 at 15:07