2

I'm looking at GCC with NVPTX offloading (specifically on Windows/MinGW-w64), and I was wondering if GCC itself can take advantage of this, so it has more processing power to do faster compiling/linking?

Or does this question make little sense as these processes are not mathematical enough in nature?

There's also the fact that GCC has some dependencies that are mathematical in nature (mpfr, gmp, mpc, isl), so maybe they can take advantage of offloading to make GCC faster using GPU?

Silicomancer
  • 8,604
  • 10
  • 63
  • 130
Brecht Sanders
  • 6,215
  • 1
  • 16
  • 40
  • I would rather guess that the problems are not parallel enough in nature. And even if a part of them is, one always has to subtract data-transfer time. But I'm not a compiler expert, so I'm curious what others will say. – paleonix Dec 04 '20 at 19:40
  • 1
    @Paul: Real-world compilation is embarrassingly parallel, as most programs are made up from many independent Translation Units. Linking is the hard part, and to a large degree that's a system design problem - there is a very old design pattern that linkers use files on disk, and don't talk to compilers directly. – MSalters Dec 07 '20 at 10:36
  • @MSalters As longs as you don't have tens of thousands of translation units this is not parallel enough for a GPU. The answer seems to make the same argument as me. – paleonix Dec 08 '20 at 07:33
  • @Paul: It's more an argument that Amdahl's Law applies. It might be possible to find even more parallelism inside each Translation Unit, but the serial bottleneck in linking remains. – MSalters Dec 08 '20 at 08:00
  • @MSalters Linking and Amdahl's law aren't even mentioned in the answer. But more importantly I would like to know how one would find enough parallelism? I mean I guess different functions could naively be compiled in parallel, but everything optimization related that needs to know the context would need to run sequentially, or not? Maybe that's rather the place to bring Amdahl's law into it. Especially since linking (even with LTO) in my experience doesn't seem to dominate compilation time (maybe it does for projects big enough to have enough parallelism for GPUs). – paleonix Dec 08 '20 at 08:17

2 Answers2

3

"Can ...?" : No, it can't; otherwise it would be in the manual :-)

"Could ... ?": probably not; compilation is mostly walking over data-structures, not performing parallel arithmetic operations, and is not obviously parallel other than at a very high level. One pass requires the state which was created by a previous pass, so there is a strict ordering and you can't easily execute more than one pass in parallel. (Each pass is updating a single representation of the code).

The current approach is to use make -j8 or similar to compile multiple files simultaneously, but even there you are unlikely to have anywhere near enough parallelism to keep a GPU busy.

Jim Cownie
  • 2,409
  • 1
  • 11
  • 20
1

Can it use it? Yes. Should it? Probably not, but not for the reasons many people state.

tl;dr: Most toolchains are stuck using old and outdated technique, data structure, and algorithms designed by the constraints of very old computers.

Contrary to many who claim compilation and linking are not parallelizable, they are. Often times, linking is actually the slowest part of the process. Linking and compilation have essentially not been parallelized beyond "job server" implementations is for two main reasons.

One, until more recently, most computers did not have enough memory or CPU threads to make such a technique worthwhile, and anyone with enough money to spend on having enough GPUs to perform such a task would receive a better ROI by simply buying multiple CPU's and doing distributed compilation.

Second, while new inventions in optimizations and other techniques like link-time-optimizations (which also does compilation and code generation at link time) have improved the output of compilers and linkers, most of the tools are designed on very old ideas, old code, and carry a lot of cruft and weight, preventing advancement due to unruly codebases.

Regardless, it is still probably not worthwhile to use a GPU. Newer tools, like the mold linker, create exponential speed up on CPU's alone. Mold has "reimplemented" as many of the basic linking tasks as possible to take advantage of modern parallel capabilities and high memory availability. It does not yet support LTO, but it achieves near file copying (Max I/O bandwidth) speeds during linking. Using implemental/cached builds, Clang and Chrome can be linked in less than one second on a 32 core thread ripper processor, compared to about 60 seconds on GNU's gold linker, or 10 seconds with lld,on the same processor.

You can learn more about mold here, if you wish: https://github.com/rui314/mold

  • Thanks for the tip. I tied to build mold with MinGW-w64, but I guess it's not portable enough. Also at first glance it seems like an ELF compiler, so I'm not sure it can be used for COFF/PE output. – Brecht Sanders Oct 01 '21 at 07:46
  • Yeah, it's still in development, unfortunately. It does state that there are plans for supporting other OS's, ISA's, and LTO, once the basic x86-64 implementation for Linux is fleshed out. Looks like macho has some degree of implementation. So it's not exactly helpful now, sorry, haha. Wasn't sure if it would work with something like mingw. – Emma Jane Bonestell Oct 01 '21 at 13:38