0

Problem: inference results from deepstream and local inference do not match (using same png images).

While testing what percentage of predictions match between engine and pth models, only 26% matched out of 180k images.

How I reproduce results: I save images after they go through streammux in 416x416 shape and .png format. For each image I also save bounding box coordinates where YoloV4 detected objects. To test predictions I download images and bounding box coordinates for each image, then I crop object based on bounding box coordinates and run resulting image through pth model.

Version: Deepstream 5.1

Model training: I train EfficientNetB0 locally with PyTorch and use following transformations for loading data (we are training 128 classes):

import Albumentations as A from albumentations.pytorch
import ToTensorV2

train_transforms = A.Compose(
    [
        A.Resize(height=224, width=224),
        A.HorizontalFlip(p=0.5),
        A.VerticalFlip(p=0.5),
        A.RandomGamma(gamma_limit=(75, 90), p=0.8),
        A.GridDropout(ratio=0.47, p=0.6),
        A.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)),
        ToTensorV2(),
    ]
)

I run model inference locally with following preprocessing:

test_transforms = A.Compose(
    [
        A.Resize(height=224, width=224),
        A.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)),
        ToTensorV2(),
    ]
)

To export the model:

Convert trained model to .onnx:

model = efficientnet_b0(pretrained=False)
pt_model = torch.load(path_to_torch_model, map_location=torch.device("cpu"))
n_features = model.classifier[1].in_features
model.classifier[1] = nn.Linear(n_features, classes)
model.load_state_dict(pt_model)
model = nn.Sequential(model, nn.Softmax(-1))
dummy_input = torch.randn(batch_size, 3, 224, 224)
torch.onnx.export(
    model,
    dummy_input,
    path_to_onnx,
    verbose=False,
    input_names=["input_names"],
    output_names=["output_names"],
    export_params=True,
)

I checked that converted onnx model gives same results as pytorch model.

Export .onnx to engine file with following command:

docker container run \
       --gpus all \
       --rm \
       --volume $(pwd):/workspace/ \
       --volume $(pwd):/data/ \
       --workdir /workspace/ \
       nvcr.io/nvidia/tensorrt:21.02-py3  \
       trtexec --explicitBatch \
       --onnx=best_23.onnx \
       --saveEngine=efficientnet.engine \
       --fp16 \
       --workspace=4096

Deepstream configuration:

RTSP stream → Streammux (reshaping to 416x416) → YoloV4 (bounding boxes) → Classification

Deepstream classification config:

[property]
gpu-id=0
offsets=103.53;116.28;123.675
net-scale-factor=0.01735207357279195
labelfile-path=../classifier/labels.txt
model-engine-file=…/classifier/efficientnet.engine
infer-dims=3;224;224
network-mode=2
network-type=1
num-detected-classes=128
interval=0
classifier-threshold=0

Questions:

  1. How can I achieve same preprocessing during training in python as in deepstream inference, because I guess that albumentation package gives different interpolation result than deepstream inference?
  2. Are there any other mistakes that I haven't noticed?
Christian Stewart
  • 15,217
  • 20
  • 82
  • 139

0 Answers0