At my company, we are building software that we need to push to customers when we update software (It's being pushed to custom hardware).
We have a GPU on that custom hardware that is fixed, but sometimes, we might need to upgrade the CUDA and CUDNN runtime if we upgrade things in our software (such as libtorch).
The problem now is that because of this, we have to ship CUDA and CUDNN together, which bloats the size of the binaries to over 2GB.
While the actual size of our executable is only 100MB. Is there any smart way around this?