Distributing CUDA runtime to customers but it's too big

Question

At my company, we are building software that we need to push to customers when we update software (It's being pushed to custom hardware).

We have a GPU on that custom hardware that is fixed, but sometimes, we might need to upgrade the CUDA and CUDNN runtime if we upgrade things in our software (such as libtorch).

The problem now is that because of this, we have to ship CUDA and CUDNN together, which bloats the size of the binaries to over 2GB.

While the actual size of our executable is only 100MB. Is there any smart way around this?

*"How does static linking help?"* -- By not having to ship whole libraries separately? — MWB, Oct 13 '21 at 21:38
Yes but that is just going to bloat the size of the local library / .so file yes? — raaj, Oct 14 '21 at 00:49

score 1 · Answer 1 · answered Oct 14 '21 at 09:02

1

https://pytorch.org doesn't advertize it, but there is a static version of libtorch available (replace 'shared' with 'static' in the URL).

Link against those libraries instead. Your binary will be a bit bigger (depending on how much of the library your code is using), but on the plus side you'll be saving 1.2GB there, because you don't have to ship the libraries.

CUDA and cuDNN should also have static versions available, although they might be missing in some re-distributions (like in Anaconda).

answered Oct 14 '21 at 09:02

MWB

11,740
6
46
91

So, I just tried to compile libtorch myself in the static build mode. The generate .a files all work when i try to do an `ld libtorch_cpu.so` etc. However, if i try to integrate it into my project, I get a whole bunch of linker errors. I dont know whats going on – raaj Oct 14 '21 at 17:57
@raaj That sounds like a separate question (and you'll need to provide a lot more info: error messages, specific commands, **minimal reproducible example**, etc.) – MWB Oct 14 '21 at 18:46

Distributing CUDA runtime to customers but it's too big

1 Answers1