I'm trying implementing deep learning model into TensorRT runtime. The model conversion step is done quite OK and i'm pretty sure about it.
Now there's 2 parts i'm currently struggle with is memCpy data from host To Device (like openCV to Trt) and get the right output shape in order to get the right data. So my questions is:
How actually a shape of input dims relate with memory buffer. What is the difference when the model input dims is NCHW and NHWC, so when i read a openCV image, it's NHWC and also the model input is NHWC, do i have to re-arange the buffer data, if Yes then what's the actual consecutive memory format i have to do ?. Or simply what does the format or sequence of data that the engine are expecting ?
About the output (assume the input are correctly buffered), how do i get the right result shape for each task (Detection, Classification, etc..).. Eg. an array or something look similar like when working with python .
I read Nvidia docs and it's not beginner-friendly at all.
//Let's say i have a model thats have a dynamic shape input dim in the NHWC format.
auto input_dims = nvinfer1::Dims4{1, 386, 342, 3}; //Using fixed H, W for testing
context->setBindingDimensions(input_idx, input_dims);
auto input_size = getMemorySize(input_dims, sizeof(float));
// How do i format openCV Mat to this kind of dims and if i encounter new input dim format, how do i adapt to that ???
And the expected output dims is something like (1,32,53,8) for example, the output buffer result in a pointer and i don't know what's the sequence of the data to reconstruct to expected array shape.
// Run TensorRT inference
void* bindings[] = {input_mem, output_mem};
bool status = context->enqueueV2(bindings, stream, nullptr);
if (!status)
{
std::cout << "[ERROR] TensorRT inference failed" << std::endl;
return false;
}
auto output_buffer = std::unique_ptr<int>{new int[output_size]};
if (cudaMemcpyAsync(output_buffer.get(), output_mem, output_size, cudaMemcpyDeviceToHost, stream) != cudaSuccess)
{
std::cout << "ERROR: CUDA memory copy of output failed, size = " << output_size << " bytes" << std::endl;
return false;
}
cudaStreamSynchronize(stream);
//How do i use this output_buffer to form right shape of output, (1,32,53,8) in this case ?