Questions tagged [triton]

Triton is an open source project providing hybrid cloud computing infrastructure and is sponsored by Joyent.

Triton was formerly named SmartDataCenter and the github repository still uses the term SDC or SmartDataCenter interchangeably.

29 questions
0
votes
0 answers

How to set up configuration file for sagemaker triton inference?

I have been looking examples and ran into this from aws, https://github.com/aws/amazon-sagemaker-examples/blob/main/sagemaker-triton/ensemble/sentence-transformer-trt/examples/ensemble_hf/bert-trt/config.pbtxt. based on this example , we need to…
suwa
  • 23
  • 4
0
votes
0 answers

Why pytorch 2.0 introduces Triton DSL as the backend language for Nvidia device?

PyTorch2.0 introduced a compiler--Inductor, and Inductor generage Triton DSL for generating ptx code. I am curious about why Triton DSL, but not any other DSL that can be compiled to PTX code, was selected as the backend language for Inductor. Is it…
0
votes
0 answers

how to pass inference request of type tritonclient.http in a multi model endpoint in aws sagemaker?

set up - multi model endpoint in aws sagemaker with nvidia triton server. based on the documentation provided here ->…
0
votes
1 answer

How to pass inputs for my triton model using tritionclient python package?

My triton model config.pbtxt file looks like below. How can I pass inputs and outputs using tritonclient and perform an infer request. name: “cifar10” platform: “tensorflow_savedmodel” max_batch_size: 10000 input [ { name: “input_1” data_type:…
Mahesh
  • 25
  • 6
0
votes
0 answers

Loading Onnx runtime optimized model in Triton - Error Unrecognized attribute: mask_filter_value for operator Attention

I converted my model into Onnx and then onnxruntime transformer optimization step is also done. Model is successfully loading and logits values are being matched with the native model as well. I moved this model to Triton server but facing following…
Hammad Hassan
  • 1,192
  • 17
  • 29
0
votes
1 answer

triton inference server: deploy model with input shape BxN config.pbtxt

I have installed triton inference server with docker, docker run --gpus=1 --rm -p8000:8000 -p8001:8001 -p8002:8002 -v /mnt/data/nabil/triton_server/models:/models nvcr.io/nvidia/tritonserver:22.08-py3 tritonserver --model-repository=/models I have…
Zabir Al Nazi
  • 10,298
  • 4
  • 33
  • 60
0
votes
1 answer

Triton Inference Server - tritonserver: not found

I try to run NVIDIA’s Triton Inference Server. I pulled the pre-built container nvcr.io/nvidia/pytorch:22.06-py3 and then run it with the command run --gpus=1 --rm -p8000:8000 -p8001:8001 -p8002:8002 -v/F/models:/models…
Antonina
  • 604
  • 1
  • 5
  • 16
0
votes
0 answers

Why triton serving shared memory failed with running multiple workers in uvicorn in order to send multiple request concurrently to the models?

I run a model in triton serving with shared memory and it works correctly. In order to simulate backend structure I wrote a Fast API for my model and run it with gunicorn with 6 workers. Then I wrote anthor Fast API to route locust requests to my…
MediaJ
  • 41
  • 7
0
votes
0 answers

Triton into Gitlab CI

Having problems with implementing triton service into gitlab CI. As I noticed in the triton github https://github.com/triton-inference-server/server, they don't have any exposed port by default in Dockerfile and I'm not really able to access the…
Leemosh
  • 883
  • 6
  • 19
0
votes
1 answer

Nvidia Triton tensorflow string parameter

I have a tensorflow model with a string parameter as input. Whats the type to use for strings in the Triton Java api? Eg. Model definition { "name":"test_model", "platform":"tensorflow_savedmodel", "backend":"tensorflow", …
oluies
  • 17,694
  • 14
  • 74
  • 117
0
votes
0 answers

Is there any efficient way to convert Z3's into assembly code?

I need something like that for x86 arch: mov edi, dword ptr [0x7fc70000] add edi, 0x11 sub edi, 0x33F0B753 After Z3 simplification I have got (memory 0x7FC70000 is symbolized): bvadd (_ bv3423553726 32) MEM_0x7FC70000 The last step is converting…
DBenson
  • 377
  • 3
  • 12
0
votes
1 answer

What is the best way to translate Z3's AST into ASM code?

There is an example: mov edi, dword ptr [0x7fc70000] add edi, 0x11 sub edi, 0x33F0B753 After Z3 simplification I have got (memory 0x7FC70000 is symbolized): bvadd (_ bv3423553726 32) MEM_0x7FC70000 Now I need to convert Z3 into ASM to get result…
DBenson
  • 377
  • 3
  • 12
0
votes
0 answers

Terraform doesn't build triton machine

I've set my first steps into the world of terraform, I'm trying to deploy infrastructure on Joyent triton. After setup, I wrote my first .tf (well, copied from the examples) and hit terraform apply. All seems to go well, it doesn't break on errors,…
Erwin
  • 1
  • 1
-2
votes
1 answer

Is it possible to use latest triton server version on older version of cuda driver (470) by using cuda-compat 12.1?

For some reason, I didn't update the cuda driver version of my environment, currently using 470.42.01 But I wanted to use the latest triton-influence-server(23.04, Requires NVIDIA CUDA 12.1.0 by default, so I tried something like this: FROM…
聂小涛
  • 503
  • 3
  • 16
1
2