I'm exploring the usage of different prompts on gpt3.5-turbo.
Investigating over the differences between "ChatCompletion" and "Completion", some references say that they should be more or less the same, for example: https://platform.openai.com/docs/guides/gpt/chat-completions-vs-completions
Other sources say, as expected, that ChatCompletion is more useful for chatbots, since you have "roles" (system, user and assistant), so that you can orchestrate things like few-shot examples and/or memory of previous chat messages. While Completion is more useful for summarization, or text generation.
But the difference seems to be much bigger. I can't find references where they explain what is happening under the hood.
The following experiment gives me totally diferent results, even when using the same model with the same parameters.
With ChatCompletion
import os
import openai
openai.api_type = "azure"
openai.api_version = "2023-03-15-preview"
openai.api_base = ...
openai.api_key = ...
chat_response = openai.ChatCompletion.create(
engine="my_model", # gpt-35-turbo
messages = [{"role":"user","content":"Give me something intresting:\n"}],
temperature=0,
max_tokens=800,
top_p=0.95,
frequency_penalty=0,
presence_penalty=0,
stop=None)
print(chat_response.choices[0]['message']['content'])
Result is a fact about a war:
Did you know that the shortest war in history was between Britain and Zanzibar in 1896? It lasted only 38 minutes!
With Completion
regular_response = openai.Completion.create(
engine="my_model", # gpt-35-turbo
prompt="Give me something intresting:\n",
temperature=0,
max_tokens=800,
top_p=0.95,
frequency_penalty=0,
presence_penalty=0,
stop=None)
print(regular_response['choices'][0]['text'])
Result is a python code and some explanation of what it does:
```
import random
import string
def random_string(length):
return ''.join(random.choice(string.ascii_letters) for i in range(length))
print(random_string(10))
```
Output:
```
'JvJvJvJvJv'
```
This code generates a random string of length `length` using `string.ascii_letters` and `random.choice()`. `string.ascii_letters` is a string containing all ASCII letters (uppercase and lowercase). `random.choice()` returns a random element from a sequence. The `for` loop generates `length` number of random letters and `join()` concatenates them into a single string. The result is a random string of length `length`. This can be useful for generating random passwords or other unique identifiers.<|im_end|>
Notes
- I'm using the same parameters (temperature, top_p, etc). The only difference is the ChatCompletion/Completion api.
- The model is the same in both cases, gpt-35-turbo.
- I'm keeping the temperature low so I can get more consistent results.
- Other prompts also give totally different answers, like if I try something like "What is the definition of song?"
The Question
- Why is this happening?
- Shouldn't same prompts give similar results given that they are using the same model?
- Is there any reference material where OpenAI explains what it is doing under the hood?