In my use case I am using openai models hosted on azure. I am trying to generate a list of senteces or words with a specific length. Lets take this prompt as an example:
Give 10 Examples of pizza ingredients:
1. tomatoes
2. mushrooms
The text-davinci-003 model completes the list as expected and stops but the gpt-3.5-turbo model generates tokens until the token limit is reached, even when I tell the model to stop when the task is done. Using few shot prompting also doesn't seem to work here.
Hacky workarounds
Using a low value for max_tokens. But it is hard to estimate the value because parts of the prompt will be changed dynamically in the application. And it still needs postprocessing to remove wasted tokens.
Put a counter before the examples and then using a specific number as stop sequence. When using a general counter like above then I need to ensure that the stop sequence won't be generated accidentally so that the model stops. When using an unusual counter like "1~~", "2~~"... there is a chance that the model malforms the stop sequence so that it still will be generating until the limit is reached.
Is there a clean and easy solution to let the model stop generating, like text-davinci-003 does?