How to set up configuration file for sagemaker triton inference?

Question

I have been looking examples and ran into this from aws, https://github.com/aws/amazon-sagemaker-examples/blob/main/sagemaker-triton/ensemble/sentence-transformer-trt/examples/ensemble_hf/bert-trt/config.pbtxt. based on this example , we need to define the input and output , data types for those input and output. the example is not clear on what the dims ( probably dimensions) represent , is it number of elements in an array of inputs ? also , what Is max_batch_size ? and at the bottom , we have to specify instance group and kind is set to KIND_GPU, I assume if we are using a cpu based instance , we can change this to cpu. do we need to specify , how many cpu we want to use?

name: "bert-trt"
platform: "tensorrt_plan"
max_batch_size: 16
input [
  {
    name: "token_ids"
    data_type: TYPE_INT32
    dims: [128]
  }...
]
output [
  {
    name: "output"
    data_type: TYPE_FP32
    dims: [128, 384]
  }...
]
instance_group [
    {
      kind: KIND_GPU
    }
  ]

I have tested the given example , but if we want to use a text based input and do tokenization in the server, how does this config.pbtxt file look like?

How to set up configuration file for sagemaker triton inference?

0 Answers0