0

I've been performing classification using GPT-3/3.5/4 models by restricting outputs using the logit_bias parameter. I am not sure how to do the same in open source models, specifically llama, llama2, and their derivatives.

I have the model weights for llama and llama2, but I have not been approved for llama2 HuggingFace. Approval is taking too long so I think I am going to use the script provided by meta itself. I do not see any logit_bias parameter in the generation function in llama models.

Could someone point out how to specify logit_bias for llama 1 and 2 models using the meta scripts?

I see Hugging Face has a way to use logit_bias here. I have not tried it yet, I hope I can use this for llama derivatives in Hugging Face. I am unable to use the conversion script from Hugging Face to turn my meta llama weights to hg version due to memory constraints and running on a remote server using ssh.

  • Insofar as there are existing implementations of this, you might want to look at [Microsoft Guidance](https://github.com/microsoft/guidance) and [LMQL](https://lmql.ai/) as a starting point. Indeed, LMQL's docs [talk about why OpenAI's API is comparatively limiting](https://docs.lmql.ai/en/latest/language/openai.html#openai-api-limitations). – Charles Duffy Jul 28 '23 at 13:04
  • @CharlesDuffy Yep, Guidance seems to be using higgingface implementation. So far the best solution is just to use huggingface. – tanny411 Jul 29 '23 at 03:15
  • LMQL supports llama.cpp; HF is not your only choice. And you can get GGML-quantized versions of Llama-2 directly from TheBloke if your access to the original weights is still held up. – Charles Duffy Jul 29 '23 at 12:35

0 Answers0