Highest Voted 'triton' Questions

0

votes

0 answers

How to set up configuration file for sagemaker triton inference?

I have been looking examples and ran into this from aws, https://github.com/aws/amazon-sagemaker-examples/blob/main/sagemaker-triton/ensemble/sentence-transformer-trt/examples/ensemble_hf/bert-trt/config.pbtxt. based on this example , we need to…

asked Jul 20 '23 at 01:25

suwa

23
4

0

votes

0 answers

Why pytorch 2.0 introduces Triton DSL as the backend language for Nvidia device?

PyTorch2.0 introduced a compiler--Inductor, and Inductor generage Triton DSL for generating ptx code. I am curious about why Triton DSL, but not any other DSL that can be compiled to PTX code, was selected as the backend language for Inductor. Is it…

pytorch triton

asked Jul 17 '23 at 08:02

Minerva Yu

1

0

votes

0 answers

how to pass inference request of type tritonclient.http in a multi model endpoint in aws sagemaker?

set up - multi model endpoint in aws sagemaker with nvidia triton server. based on the documentation provided here ->…

python amazon-web-services nvidia amazon-sagemaker triton

asked Jul 15 '23 at 20:48

haju

95
6

0

votes

1 answer

How to pass inputs for my triton model using tritionclient python package?

My triton model config.pbtxt file looks like below. How can I pass inputs and outputs using tritonclient and perform an infer request. name: “cifar10” platform: “tensorflow_savedmodel” max_batch_size: 10000 input [ { name: “input_1” data_type:…

python tritonserver triton

asked Jun 04 '23 at 15:33

Mahesh

25
6

0

votes

0 answers

Loading Onnx runtime optimized model in Triton - Error Unrecognized attribute: mask_filter_value for operator Attention

I converted my model into Onnx and then onnxruntime transformer optimization step is also done. Model is successfully loading and logits values are being matched with the native model as well. I moved this model to Triton server but facing following…

pytorch onnx onnxruntime tritonserver triton

asked Mar 23 '23 at 10:48

Hammad Hassan

1,192
17
29

0

votes

1 answer

triton inference server: deploy model with input shape BxN config.pbtxt

I have installed triton inference server with docker, docker run --gpus=1 --rm -p8000:8000 -p8001:8001 -p8002:8002 -v /mnt/data/nabil/triton_server/models:/models nvcr.io/nvidia/tritonserver:22.08-py3 tritonserver --model-repository=/models I have…

pytorch triton tritonserver

asked Sep 28 '22 at 07:13

Zabir Al Nazi

10,298
4
33
60

0

votes

1 answer

Triton Inference Server - tritonserver: not found

I try to run NVIDIA’s Triton Inference Server. I pulled the pre-built container nvcr.io/nvidia/pytorch:22.06-py3 and then run it with the command run --gpus=1 --rm -p8000:8000 -p8001:8001 -p8002:8002 -v/F/models:/models…

triton tritonserver

asked Jul 06 '22 at 10:18

Antonina

604
1
5
16

0

votes

0 answers

Why triton serving shared memory failed with running multiple workers in uvicorn in order to send multiple request concurrently to the models?

I run a model in triton serving with shared memory and it works correctly. In order to simulate backend structure I wrote a Fast API for my model and run it with gunicorn with 6 workers. Then I wrote anthor Fast API to route locust requests to my…

shared-memory fastapi locust uvicorn triton

asked May 10 '22 at 07:53

MediaJ

41
7

0

votes

0 answers

Triton into Gitlab CI

Having problems with implementing triton service into gitlab CI. As I noticed in the triton github https://github.com/triton-inference-server/server, they don't have any exposed port by default in Dockerfile and I'm not really able to access the…

gitlab gitlab-ci triton tritonserver

asked Jul 01 '21 at 11:55

Leemosh

883
6
19

0

votes

1 answer

Nvidia Triton tensorflow string parameter

I have a tensorflow model with a string parameter as input. Whats the type to use for strings in the Triton Java api? Eg. Model definition { "name":"test_model", "platform":"tensorflow_savedmodel", "backend":"tensorflow", …

java scala tensorflow nvidia triton

asked Mar 10 '21 at 19:47

oluies

17,694
14
74
117

0

votes

0 answers

Is there any efficient way to convert Z3's into assembly code?

I need something like that for x86 arch: mov edi, dword ptr [0x7fc70000] add edi, 0x11 sub edi, 0x33F0B753 After Z3 simplification I have got (memory 0x7FC70000 is symbolized): bvadd (_ bv3423553726 32) MEM_0x7FC70000 The last step is converting…

assembly compiler-construction z3 smt triton

asked May 15 '20 at 19:11

DBenson

377
3
12

0

votes

1 answer

What is the best way to translate Z3's AST into ASM code?

There is an example: mov edi, dword ptr [0x7fc70000] add edi, 0x11 sub edi, 0x33F0B753 After Z3 simplification I have got (memory 0x7FC70000 is symbolized): bvadd (_ bv3423553726 32) MEM_0x7FC70000 Now I need to convert Z3 into ASM to get result…

assembly compiler-construction z3 smt triton

asked May 14 '20 at 15:26

DBenson

377
3
12

0

votes

0 answers

Terraform doesn't build triton machine

I've set my first steps into the world of terraform, I'm trying to deploy infrastructure on Joyent triton. After setup, I wrote my first .tf (well, copied from the examples) and hit terraform apply. All seems to go well, it doesn't break on errors,…

terraform triton

asked Jan 17 '18 at 20:37

Erwin

1
1

-2

votes

1 answer

Is it possible to use latest triton server version on older version of cuda driver (470) by using cuda-compat 12.1?

For some reason, I didn't update the cuda driver version of my environment, currently using 470.42.01 But I wanted to use the latest triton-influence-server（23.04, Requires NVIDIA CUDA 12.1.0 by default, so I tried something like this: FROM…

tensorflow cuda nvidia onnx triton

asked May 20 '23 at 02:30

聂小涛

503
3
16

Questions tagged [triton]