Detectron2 Speed up inference instance segmentation

Question

I have working instance segmentation, I'm using "mask_rcnn_R_101_FPN_3x" model. When I inference image it takes about 3 second / image on GPU. How can I speed up it faster ?

I code in Google Colab

This is my setup config:

cfg = get_cfg()

cfg.merge_from_file(model_zoo.get_config_file("COCO-InstanceSegmentation/mask_rcnn_R_101_FPN_3x.yaml"))

cfg.MODEL.ROI_HEADS.NUM_CLASSES = 1 

cfg.OUTPUT_DIR = "/content/drive/MyDrive/TEAM/save/"

cfg.DATASETS.TRAIN = (train_name,)
cfg.DATASETS.TEST = (test_name, )
cfg.MODEL.WEIGHTS = os.path.join(cfg.OUTPUT_DIR, "model_final.pth")

This is inference:

torch.backends.cudnn.benchmark = True
start = time.time()

predictor = DefaultPredictor(cfg) 

im = cv2.imread("/content/drive/MyDrive/TEAM/mcocr_val_145114ixmyt.jpg")

outputs = predictor(im) 

print(f"Inference time per image is : {(time.time() - start)} s")

Return time:

Inference time per image is : 2.7835421562194824 s

Image I inference size 1024 x 1024 pixel. I have change different size but it still inference 3 second / image. Am I missing anything about Detectron2 ?

More information GPU enter image description here

Please replace the image links by text as it helps engines to reference S.O. posts and readers too. — Jérôme Richard, Apr 10 '21 at 15:34
K80 are kinda of slow GPUs for today's standard. I think this is expected, especially because you're measuring not only the inference, but model setup and image loading. — Berriel, Apr 10 '21 at 15:34

dragon7 · Answer 1 · 2022-07-28T12:47:29.280

There is a third way. You could use a faster toolkit for the inference e.g. OpenVINO. OpenVINO is optimized specifically for Intel hardware but it should work with any CPU. It optimizes your model by converting to Intermediate Represantation (IR), performing graph pruning and fusing some operations into others while preserving accuracy. Then it uses vectorization in runtime.

If you are able to export Detectron2 to ONNX model you can utilize OpenVINO. You can find a full tutorial on how to convert the ONNX model and performance comparison here. Some snippets below.

Install OpenVINO

The easiest way to do it is using PIP, especially when you use Google Colab.

pip install openvino-dev[onnx]

Use Model Optimizer to convert ONNX model

The Model Optimizer is a command line tool which comes from OpenVINO Development Package. It converts the ONNX model to IR, which is a default format for OpenVINO. You can also try the precision of FP16, which should give you better performance (just change data_type). Run in command line:

mo --input_model "model.onnx" --input_shape "[1,3, 224, 224]" --mean_values="[123.675, 116.28 , 103.53]" --scale_values="[58.395, 57.12 , 57.375]" --data_type FP32 --output_dir "model_ir"

Run the inference

The converted model can be loaded by the runtime and compiled for a specific device e.g. CPU or GPU (integrated into your CPU like Intel HD Graphics). If you don't know what is the best choice for you, just use AUTO.

# Load the network
ie = Core()
model_ir = ie.read_model(model="model_ir/model.xml")
compiled_model_ir = ie.compile_model(model=model_ir, device_name="CPU")

# Get output layer
output_layer_ir = compiled_model_ir.output(0)

# Run inference on the input image
result = compiled_model_ir([input_image])[output_layer_ir]

Disclaimer: I work on OpenVINO.

score 0 · Answer 2 · answered Apr 15 '21 at 13:52

0

These are the two best ways to decrease inference time:

Use a better GPU
Use a shallow network - for example R50 - look at the inference times here: https://github.com/facebookresearch/detectron2/blob/master/MODEL_ZOO.md

Decreasing the image size will not decrease the inference time because mask-rcnn has the same number of parameters no matter the size of the image - thus no change in inference time.

answered Apr 15 '21 at 13:52

gap210

55
7

Can you post an example of using a shallow network with Detectron2, please? – eTothEipiPlus1 Dec 08 '21 at 20:40

Detectron2 Speed up inference instance segmentation

2 Answers2