I've built a macOS app that integrates with ChatGPT. To support the model's streaming responses, I'm using Alamofire's AF.streamRequest
to continuously update a label on screen. It looks something like this:
let request = AF.streamRequest(OpenAI.completionEndpoint,
method: .post,
parameters: params,
encoder: JSONParameterEncoder.default,
headers: headers)
request.responseStream { stream in
switch stream.event {
case let .stream(result):
switch result {
case let .success(data):
let chunk = self.parse(event: data)
completionHandler(.success(chunk))
case .failure(_):
completionHandler(.failure(.streamError))
}
case let .complete(data):
do {
try self.parse(completion: data)
completionHandler(.success(nil))
} catch let error as OpenAIError {
completionHandler(.failure(error))
}
}
}
Everything works as expected when running the code if I either 1) run it often, i.e. send messages with less than a 1 minute interval or 2) run it with more than about a 3 minute interval between each message. I run into problems when sending a message at time T
and then another message at time, say, T+75 seconds
.
In this case, I see the following in the debugger:
2023-05-16 23:08:08.564625+0200 MyAppName[67708:2229729] [quic] quic_conn_keepalive_handler [C3.1.1.1:2] [-0151e271df8314d16b51a3715f831c5705eff3a3] keep-alive timer fired, exceeding 2 outstanding keep-alives
2023-05-16 23:08:08.565701+0200 MyAppName[67708:2229729] [connection] nw_read_request_report [C3] Receive failed with error "Operation timed out"
2023-05-16 23:08:08.565843+0200 MyAppName[67708:2229729] [connection] nw_read_request_report [C3] Receive failed with error "Operation timed out"
2023-05-16 23:08:08.566089+0200 MyAppName[67708:2229729] [connection] nw_read_request_report [C3] Receive failed with error "Operation timed out"
2023-05-16 23:08:08.566243+0200 MyAppName[67708:2229729] [connection] nw_read_request_report [C3] Receive failed with error "Operation timed out"
2023-05-16 23:08:08.568284+0200 MyAppName[67708:2229729] [h3connection] 0x7fa55a05ba18 3 stalled, attempting fallback
2023-05-16 23:08:08.568506+0200 MyAppName[67708:2229729] Task <AEBB3060-6FA0-4A9C-9AC8-17B7ACD293E1>.<2> HTTP load failed, 212/0 bytes (error code: -1005 [4:-4])
2023-05-16 23:08:08.568630+0200 MyAppName[67708:2229729] [] nw_endpoint_flow_fillout_data_transfer_snapshot copy_info() returned NULL
2023-05-16 23:08:08.571567+0200 MyAppName[67708:2229808] Task <AEBB3060-6FA0-4A9C-9AC8-17B7ACD293E1>.<2> finished with error [-1005] Error Domain=NSURLErrorDomain Code=-1005 "The network connection was lost." UserInfo={_kCFStreamErrorCodeKey=-4, NSUnderlyingError=0x60000001cfc0 {Error Domain=kCFErrorDomainCFNetwork Code=-1005 "(null)" UserInfo={NSErrorPeerAddressKey=<CFData 0x600002df31b0 [0x7ff848a9f4e0]>{length = 16, capacity = 16, bytes = 0x100201bb681206c00000000000000000}, _kCFStreamErrorCodeKey=-4, _kCFStreamErrorDomainKey=4}}, _NSURLErrorFailingURLSessionTaskErrorKey=LocalDataTask <AEBB3060-6FA0-4A9C-9AC8-17B7ACD293E1>.<2>, _NSURLErrorRelatedURLSessionTaskErrorKey=(
"LocalDataTask <AEBB3060-6FA0-4A9C-9AC8-17B7ACD293E1>.<2>"
), NSLocalizedDescription=The network connection was lost., NSErrorFailingURLStringKey=https://api.openai.com/v1/chat/completions, NSErrorFailingURLKey=https://api.openai.com/v1/chat/completions, _kCFStreamErrorDomainKey=4}
If I attempt another request (i.e. after the failure), that works just fine.
Does anyone have a clue what's going on here? My best guess is that Alamofire (or perhaps even a lower-level system library) is attempting to send the subsequent request over the original (now dead) connection. I've used Wireshark to inspect the packages being sent to and from the server, and it looks like a QUIC handshake takes place for the first message but not for the second one. I am by no means a network expert though, so I'm not sure if this is relevant.
Update: I'm experiencing exactly the same issue using the standard AF.request
and with no streaming responses. So far, my workaround is to add an interceptor to the request that automatically retries in case of failure, but I would still love a less hacky solution.