Link-time optimizations in CUDA 11 - what are they and how to use them?

Question

The CUDA 11 features announcement, it's said that there are now:

New link time optimization capabilities

what link-time optimizations does nvcc actually employ (e.g. relative to the LTO capabilities in host-side code with g++ or clang++)?

Also - is there something one needs to do to get LTO enabled, or does it always occur (unlike with host-side code where you need to compile with an -flto switch?

I haven't explored that new feature yet, but would *assume* that function inlining across compilation units is one of those capabilities. Should be easy enough to confirm or refute with a simple experiment. — njuffa, Feb 24 '21 at 21:57

einpoklum · Answer 1 · 2021-02-24T18:59:09.387

2

Partial answer:

To enable link-time optimization, use --dlink-time-opt (or dlto) when invoking the NVCC compiler, both for compilation and for device-side code linking. No (link-time) optimization will be applied if the compiler can't find the relevant intermediate information.

edited Feb 24 '21 at 18:59

answered Feb 24 '21 at 12:20

einpoklum

118,144
57
340
684

score 0 · Answer 2 · answered Dec 06 '21 at 23:29

0

my guess is that -dlto has to be with compile time and link time, if you link your program using non-nvcc, such as gcc or g++, then you may not get the best performance

answered Dec 06 '21 at 23:29

Yuxiang Lin

1

I already said that. – einpoklum Dec 07 '21 at 07:27

Link-time optimizations in CUDA 11 - what are they and how to use them?

2 Answers2