0

I'm new to AI, so apologies if wrong terminology used here.

I'm extracting some information from a body of text, and have setup Llama 2 in Huggingface via their Inference Endpoint so I can call it via curl.

The curl works for short inputs and generated_text answers, but for longer responses the answers seem to be severely truncated, like only a few words whereas I'm expecting to receive a lot more.

So I wanted to set max_new_tokens to a large number and temperature to 0, but I didn't see how to do that. I don't care if it's set in the curl call or configured directly in the model, either is fine. Anyone know how to do this?

Magnus
  • 10,736
  • 5
  • 44
  • 57

0 Answers0