Using Tensorflow in a low-latency high-throughput kinda way

Question

Processed data is real-time video (a bunch of sequential frames) and it all needs to end up in a DX12 buffer.

I don't care too much if data gets copied to system memory during training, but during evaluation, it must stay on GPU.

I would train the network separately in python with high latency being allowed but then after it is trained, I would use it entirely on the GPU (because my frames are already there). From my standpoint (experienced with GPGPU programming but not so much with Tensorflow) there are two ways of doing this:

Extracting the parameters from the trained model in python (weights and biases) and uploading them to the c++ program that has the same network topology on the GPU and running it there. It should behave like a Tensorflow network it was trained on.
Using Tensorlow in the c++ program as well and just passing the buffer handles for input and output (the way you would do with GPGPU) and then interop-ing with DX12 (because I need the evaluations to end up here).

Would like to know if any of those options are possible and if so, which one is better and why?

If I left anything unclear, let me know in the comments.

Using Tensorflow in a low-latency high-throughput kinda way

0 Answers0