why low latency gives low throughput?

Question

Regarding this statement on a blog about Databricks SQL

Throughput vs latency trade off
Throughput vs latency is the classic tradeoff in computer systems, meaning that a system cannot get high throughput and low latency simultaneously. If a design favors throughput (e.g. by batching data), it would have to sacrifice latency. In the context of data systems, this means a system cannot process large queries and small queries efficiently at the same time.

Does not low latency mean high throughput by definition? Why are they suggesting that low latency provides low throughput?

If ThroughPut refers to the count of requests fulfilled in the given time and latency refers time to serve a single request, then surely less time per request means we can serve more requests in the same time frame.

For instance, if latency is 1 second per request, then the server can process 10 requests in 10 seconds.

If latency is reduced to 0.5 second per request, then server's throughput is 20 requests in 10 seconds.

Shouldn't low latency mean high throughput by this definition?

"latency means time to serve a request" That is too vague to be useful. And it seems you are misinterpreting it. Quote authoritative definitions. "low latency gives low throughput" Who says? Where? What is the justification? [ask] [Help] [How much research effort is expected of Stack Overflow users?](https://meta.stackoverflow.com/q/261592/3404097) — philipxy, Feb 28 '22 at 00:48
You are correct, who is saying low latency means low throughput? It is true of some poorly designed systems that latency increases as a function of frequency. In these systems low latency can only be observed at low throughput rates but that shouldn't be accepted as normal — Chris Schaller, Feb 28 '22 at 01:31
IoT is a specific domain where we can achieve higher throughput by batching the messages for processing, but although this technique helps us achieve higher throughput rates than can easily be achieved using per message rates, it has the highest latency for individual messages. In this case the low latency vs high latency represent entirely different types of processes. — Chris Schaller, Feb 28 '22 at 01:38
Put what is needed to ask your question in your post with reference, not just at a link. Especially don't expect us to watch a video. Paraphrase/quote as needed (and only that) from other sources; relate to your question, don't expect us to read entire other sources/quotes and/or figure out what is relevant. And you still aren't quoting a reasonably researched definition of "latency". PS A blog is not an authoritative source. It's fine to question what they write, but research & also address what they say given what your research shows. PS Clarify via edits, not comments. — philipxy, Feb 28 '22 at 06:50
Who cares what a random person says somewhere on the web? Research in authoritative sources like textbooks & papers. Also search via google with 'site:' on [so] before considering posting a question. Relate research to what anyone says and/or your own points. PS Your post still doesn't use "latency" according to what it actually means. — philipxy, Feb 28 '22 at 07:03
If you think that a blog post is unclear or wrong, why not ask the author of that post for clarification? — Nico Haase, Mar 27 '22 at 08:26

score 2 · Accepted Answer · answered Feb 28 '22 at 02:17

You are correct, as a general concept, a low latency system will take a shorter amount of time to process a single operation and therefore could process more messages than the same system that exhibits a longer latency.

But in practice, especially in programming, latency of a system can be affected by the throughput. We may need to allow for resources to be cleaned up and to become ready again between cycles, some of these resources may be databases that enforce throttling limits or other processes that themselves have safe operating limits. At some point we will often hit limitations with a given processing model that will force us to change our process.

If we scale out our operator processors over more resources you may observe a significant rise in the cost of processing per message, even then you may still run into maximal throughput issues.

In these systems it is common to observe a pattern of latency increasing as the throughput requirements increases. In these systems low latency can only be affordably observed at low throughput rates.

IoT and realtime processing is a common domain where we may need to achieve a higher throughput than our low latency system can achieve, often this is realized by implementing batch processing.

Batch processing is generally a significantly higher latency than most per message flows, but overall it can allow for processing of a higher volume of messages using less resources.

In a batching system we can tune the throughput by altering the size of the batch, more messages in the batch will mean that those messages will have to wait longer to be processed, so this increases latency, but overall larger batch sizes may increase total throughput.

It's is this batch scenario that this dialog of low latency = low throughput generally comes from. It is alluded to in this clip: https://www.youtube.com/watch?v=PXHLZGp-XMc

It is not that low latency systems can only produce low throughput, but more specifically that low throughput systems can more easily achieve lower latencies.

why low latency gives low throughput?

1 Answers1