What is this kind of continuous traffic generation against production called?

Question

Traffic to our service is not entirely predictable. To help keep the service slightly over-provisioned and to provide advance warning of any degradation resulting from an increase in traffic, we maintain a kind of "continuous buffer load generator". This generates a constant load against our production API, on top of user traffic. If we find that service is degrading, it is automatically turned off, and ideally we have a bit of time to figure out the issue and scale up before natural user traffic matches the augmented traffic that had led to service degradation. The buffer load is turned back on once the service is stable again.

While we've been calling this continuous traffic generation a "continuous load test", this seems like confusing wording and makes it hard to disambiguate from "actual" load tests (which is what I'll call experiments with a defined beginning and end, load pattern, and a binary pass/fail result at the end). I almost want to call it "canary traffic", since we're sending in additional traffic to warn us of issues before our users encounter it, but it doesn't line up well with the general understanding of the meaning of canary in this industry.

This is an additional strategy on top of load balancing, autoscaling, etc. We're not trying to replace any industry-standard traffic management steps here.

I suspect this is a case of not knowing the right words to Google, so:

If this is a pattern used elsewhere, what is it called?
Alternatively, if no one else does this, why not? I'm entirely willing to be convinced that this result can be better obtained with some other kind of testing or monitoring.

score 1 · Accepted Answer · answered Jul 21 '20 at 02:01

Synthetic or active monitoring is a term for artificial load that simulates what the application actually does. In the context of measuring application performance.

Simulating your actual load is fantastic. However, consuming a sizable fraction of your resources in production is not efficient, it consumes resources. More importantly, the automatic disable mechanism becomes critical to maintaining good performance. Instead, throttle back to a minimum level at all times, and continue measuring response times and error rates. Never stop measuring, as degradation will show user impact of events.

Realistic load generators are good for testing and capacity planning. Provision a different compute instance size in a test environment, and push the load until it falls over. As a part of a high availability test or rolling upgrade, add some load temporarily to validate an otherwise idle system.

Decide what the response time objective is. Learn however many requests per second is safe. Set auto scaling or alerts at actionable thresholds.

Measuring service level objectives, plus knowing the limits, will give you tools to do proper capacity planning. Without burning your buffer capacity artificially.

I tried to write the question in as unbiased a way as possible, but my gut was giving me similar opinions to you. I appreciate the pointers on what kinds of work I should do so we fully understand our system's limits so I can advocate for deprecating this system in the future. — KernelPanic, Jul 26 '20 at 19:20

What is this kind of continuous traffic generation against production called?

1 Answers1