18

In the talk "Beyond DevOps: How Netflix Bridges the Gap," around 29:10 Josh Evans mentions squeeze testing as something that can help them understand system drift. What is squeeze testing and how is it implemented?

Dmitry Chornyi
  • 1,821
  • 4
  • 25
  • 33
  • 5
    Interesting, I am watching Josh Evans talk about that in this video: https://www.youtube.com/watch?v=CZ3wIuvmHeM. He mentions it around 34:00. I had to Google it :) – agm1984 Jan 08 '18 at 03:31
  • 1
    Arrived here after hearing in this in same video. – Janek Bogucki Nov 01 '20 at 16:59

2 Answers2

20

It seems to be a term used by the folks at Netflix.

It is to run some tests/benchmarks to see changes in performance and calculate the breaking point of the application. Then see if that last change was inefficient or determine the recommended auto-scaling parameters before deploying it.

There is a little more information here and here:

One practice which isn’t yet widely adopted but is used consistently by our edge teams (who push most frequently) is automated squeeze testing. Once the canary has passed the functional and ACA analysis phases the production traffic is differentially steered at an increased rate against the canary, increasing in well-defined steps. As the request rate goes up key metrics are evaluated to determine effective carrying capacity; automatically determining if that capacity has decreased as part of the push.

zuazo
  • 5,398
  • 2
  • 23
  • 22
2

As someone who helped with the development of squeeze testing at Netflix. It is using the large amount of stateless requests from actual production traffic to test the system. One of those ways is to put inordinately more load on one instance of a service until it breaks. Monitor the key performance metrics of that instance and use that information to inform how to configure auto scaling policies. It eliminates the problems of fake traffic not stressing the system in the right way.

The reasons it might not work for every one:

  • you need more traffic than any one instance can handle.
  • the requests need to be somewhat uniform in demand on the service under test.
  • clients & protocol need to be resilient to errors should things go wrong.

The way it is set up is a proxy is put in front of the service. The proxy is configured to send a specific RPS to one instance. I used the bresenham's line algorithm to evenly spread the fluctuation in incoming traffic over time to a precise out going RPS. turn up the dial on the RPS, watch it burn.