how to specify temperature and max_new_tokens in the curl request to Llama 2 in Huggingface Inference Endpoint?

Asked Jul 31 '23 at 20:36

Active Jul 31 '23 at 20:36

Viewed 203 times

I'm new to AI, so apologies if wrong terminology used here.

I'm extracting some information from a body of text, and have setup Llama 2 in Huggingface via their Inference Endpoint so I can call it via curl.

The curl works for short inputs and generated_text answers, but for longer responses the answers seem to be severely truncated, like only a few words whereas I'm expecting to receive a lot more.

So I wanted to set max_new_tokens to a large number and temperature to 0, but I didn't see how to do that. I don't care if it's set in the curl call or configured directly in the model, either is fine. Anyone know how to do this?

asked Jul 31 '23 at 20:36

Magnus

10,736
5
44
57

how to specify temperature and max_new_tokens in the curl request to Llama 2 in Huggingface Inference Endpoint?

0 Answers0