Expose public API for async system

Question

We are now working on exposing our production Rest API to the outer world in order to allow the integration of different 3rd parties to our system. One of the issues we have, is that due to high scale and performance reasons, many of the API commands are handled async, so the result can't be returned directly to the caller.

For example - a deliver order command might take some time to complete, meaning that:

In the response body, we can't return the delivery files as they were not processed yet, but only 202 accepted status.
We can't guarantee that on the next call to get deliveries API these will be ready.

We have few ideas on how to address this async problem, but we were wondering if there are some best practices for async systems exposing API. Most of them are 202 status codes or maybe a command ID they can poll on / register to webhook, which seems tideous.

Is it acceptable for clients using these APIs to understand it takes time for the actions they perform to take place? and that a sync response will not always be available?

Presumably the operation to begin the overall process will return an identifier which can later be used to query status of the overall process? — David, Oct 14 '21 at 11:06
Yes, that's an option - I referenced it as `command ID`. The question is if it is a good & convenient interface between our system and its client. — Eliranf, Oct 14 '21 at 11:08

Tomasz Janczuk · Answer 1 · 2021-11-11T17:54:48.730

There are well-understood patterns used to expose long-running, async operations over HTTP APIs. How you go about it will depend on your specific requirements, here are some considerations:

Short polling. In this pattern, you have a "start" HTTP API which returns an HTTP 202 and a unique identifier of the async operation. Then you have a "query" HTTP API that accepts that unique identifier and responds with the status of the operation. It is up to the caller to decide how frequently they want to call the "query" API, but in general, there will be a lag between the async operation completion and the moment the caller finds out about it. An example of this pattern is AWS's Cloud Watch Logs Insigts StartQuery and GetQueryResults APIs.
Long polling. In this pattern, the "query" HTTP API is an HTTP long poll, which means the server will keep it open until the operation has completed, or a maximum amount of time has elapsed (typically less than 45 seconds). If the HTTP long poll returns before the async operation has completed, the caller is expected to immediately issue a new "query" operation. This patten is usually more complex to implement for you and more resource intensive (keeping TCP connections open), but results in lower latency of the caller finding out about the operation competition.
Websockets. In this pattern, the caller creates a persistent webhook connection to start the async operation and wait for its completion. This offers the lowest latency in notifying the caller about completion but goes beyond the "bread and butter" of regular HTTP's RPC semantics, so requires a bit more sophistication on the caller's side.
Webhooks. Some systems choose to expose webhooks to notify the customers about important events that occur asynchronously. Depending on the duration of your async operations this may be the right pattern to use. For example, Stripe uses webhooks a lot to notify customers about payment status changes.

Besides the handling of async operations, there is a number of other considerations when exposing HTTP APIs from your system with the idea of helping folks integrate with your application.

Expose public API for async system

1 Answers1