I converted my model into Onnx and then onnxruntime transformer optimization step is also done. Model is successfully loading and logits values are being matched with the native model as well. I moved this model to Triton server but facing following error on model loading step:
Unrecognized attribute: mask_filter_value for operator Attention
Library information is as:
onnx: 1.13.1
onnxruntime: 1.14.1
torch: 1.13.1
onnxruntime-tools:1.7.0
onnxconverter-common: 1.13.0
opset_version: 11
I tried two versions of triton inference server. Both gave the same errors:
nvcr.io/nvidia/tritonserver:21.04-py3
nvcr.io/nvidia/tritonserver:23.02-py3
Could there be something still wrong in onnx runtime but logits are matching exactly. Did anyone else face this error?