I realize this is not the intended usage of TensorRT, but I am a bit stuck so maybe there are some ideas out there. Currently I have been provided some neural network models as TensorRT serialized engines, so-called .trt files. These are basically models compiled and optimized from PyTorch to run on a specific GPU.
Now, this works fine since I do have a compatible GPU for development, however, for setting up CI/CD, I am having some trouble because the cloud servers on which it will be running for testing purposes only do not have adequate GPUs for this CUDA-compiled "engine".
So, I would like to force these models to run on CPU, or otherwise find some other way to make them run. On CPU would be just fine, because I just need to run handful of inferences to check the output, it is fine if it's slow. Again, I know this is not the intended usage of TensorRT, but I need some output from the models for integration testing.
Alternative approach
The other idea I had was maybe to convert the .trt files back to .onnx or another format that I could load into another runtime engine, or just into PyTorch or TensorFlow, but I cannot find any TensorRT tools that load an engine and write a model file. Presumably because it is "compiled" and no longer convertible; yet, the model parameters must be in there, so does anyone know how to do such a thing?