Azure OpenAI gpt-35-turbo nondeterministic with temperature 0

Question

I have noticed that my deployment of gpt-35-turbo on "Azure AI Studio" is not giving consistent responses to my chat completion prompts even when I set the temperature to 0. The longer the prompt, the more inconsistency I see.

I thought the idea with setting temperature to 0 meant consistent (deterministic) responses (given the same model). Is that not the case?

score 1 · Accepted Answer · answered Aug 09 '23 at 19:01

The temperature parameter controls the randomness of the text generated and while a value of 0 ensures the least random or more deterministic responses, they will not necessarily be exactly the same.

Ideally, in cases where you would like deterministic responses (say for the same input across devices/users/etc.) then a response cache would help.

Also, while not recommended in the docs, you could also use the top_p parameter to further control the output response. This controls the set of possible tokens to consider for the next token.

This discussion on the OpenAI forums goes into how you can use both to your benefit and better control the response from the models.

I saw the docs mentioned specifically not modifying top_p if I'm already modifying temperature. You're suggesting I set temp 0 and top_p to .1 to get the most consistent results? — alex9311, Aug 09 '23 at 22:44

score 0 · Answer 2 · answered Aug 20 '23 at 14:23

I thought the idea with setting temperature to 0 meant consistent (deterministic) responses (given the same model). Is that not the case?

It's indeed not the case. 2 reasons:

GPU non-determinism
This blogpost authored by Sherman Chann argues that "Non-determinism in GPT-4 is caused by Sparse MoE [mixture of experts]".

Azure OpenAI gpt-35-turbo nondeterministic with temperature 0

2 Answers2