Chat completion will stream function calls if you set the stream parameter to true, which you can see in this blog.
What you want though is the ability to only have it stream if its not a function call. There is currently no parameter to that allows anything like that with the chat. But there are potential work arounds.
For the work arounds, it is partially dependent on your fault tolerance and the exact details, since you don't fully know how the model will respond its hard to find a silver bullet here.
You have a few different options. If you have a limited number(4) of function calls, you could try using the stop parameter which allows you to set 4 sequences where the model will stop. Set the sequences to a function call. When the function call pops up you grab it, do your function, then go back to another streaming call. Theres also ways to interrupt a stream You can stream until you see a function call then break and do the same as above.
Example below, it may need some fiddling but this should roughly stream until it comes across a function then stop. as mentioned you can then do the function and make a new call depending on your exact use case and setup.
messages = [{"role": "user", "content": "What's the weather like in Boston?"}]
functions = [
{
"name": "get_current_weather",
"description": "Get the current weather in a given location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
},
"required": ["location"],
},
}
]
response = openai.ChatCompletion.create(
model="gpt-3.5-turbo-0613",
messages=messages,
functions=functions,
stop=["get_current_weather"]
function_call="auto", # auto is default, but we'll be explicit
)
If those solutions dont work, you can have the stream parameter to default to false for function calls, but when users know there is unlikely to be one they can pass in the parameter true(or vice versa) but theres a chance this could mess up.
You could also try predicting it, theres different ways to do this(key words, an additional ai/ml model) but have it predict if there will be a function call, and have that determine the path you go through. That being said this will be wrong occasionally.
If you have specific fault tolerances(like you can afford to not stream every time, but you cant afford to stream a function call) that may affect your design. For that scenario, youll likely need to not stream.