4

I'm trying to implement a concurrency limiter in a web application where the average response time is 5 ms.

I've based my implementation on "Performance Under Load", forking the project concurrency-limits and using the concepts explained in the amazing tech talk "Stop Rate Limiting! Capacity Management Done Right".

I've measured the amount of concurrent request that my application has under normal conditions (normal amount of request per second, normal response time) and I got the following:

1) The average value is 1,7 2) Perc 95 is 3.2 3) Max Value reach 45,50,60 depending of the sample.

With those numbers I've decided to configure a maximum capacity of concurrent request of 45. At this point I haven't ask myself why the max value was so separate from the avg.

Then, I started testing the concurrency limiter and I found that some request eventually are being rejected due to reaching the threshold of 45 that I've configured.

Is worth to mention that I'm using an AIMDLimit implementation in order to modify the max capacity on the fly. But I also have every value measured and the value of the max capacity is never less than 40.

So, I did some research on my application and I found the following. Every time that my application performs a Minor GC or a Major GC (using CMS) the value of N increases a lot. The measure values goes from 1, 2 or even 3 to, 10,11,12 and when a full GC is perform the measured value even goes up to 40,50,60 (This is the point when N is high than my threshold and the request are rejected).

This behavior makes sense because since my application is behind a tomcat container and the tomcat container uses a SO queue to poll pending requests (see "Tuning Tomcat For A High Throughput, Fail Fast System) when a Minor or Major GC is performed it is ok that the value of N increases as well.

For instance, let's analyze the following situation.

1) The application is handling 3 concurrent request

2) A GC is performed that takes 30ms

3) 10 more request arrives and are hold in the SO queue waiting the tomcat to poll them

4) The GC finish

5) 10 request are poll and the value of N (concurrent request) goes up to 13 now

The issue here is that I've also measured the gc times of my application using jstat and they do not look so bad:

+-----------+------+-------+-------+--------+-------+-------+------+---------+-----+--------+---------+
| Timestamp |  S0  |  S1   |   E   |   O    |    M  |  CCS  | YGC  |   YGCT  | FGC |  FGCT  |   GCT   |
+-----------+------+-------+-------+--------+-------+-------+------+---------+-----+--------+---------+
| 91071.2   | 0.00 | 10.13 | 94.37 | 56.99  | 96.92 | 95.11 | 4399 | 368.077 | 64  | 22.428 | 390.505 |
| 91073.2   | 8.36 | 0.00  | 3.18  | 57.16  | 96.92 | 95.11 | 4400 | 368.178 | 64  | 22.428 | 390.606 |
| *******   | **** | ****  | ****  | *****  | ***** | ***** | **** | ******* | **  | ****** | ******* |
| *******   | **** | ****  | ****  | *****  | ***** | ***** | **** | ******* | **  | ****** | ******* |
| 91099.9   | 9.69 | 0.00  | 99.87 | 32.73  | 96.78 | 94.90 | 4386 | 318.084 | 66  | 19.694 | 337.778 |
| 91101.9   | 0.00 | 9.60  | 9.72  |  32.99 | 96.78 | 94.90 | 4387 | 318.135 | 66  | 19.694 | 337.830 |
| *******   | **** | ****  | ****  | *****  | ***** | ***** | **** | ******* | **  | ****** | ******* |
| *******   | **** | ****  | ****  | *****  | ***** | ***** | **** | ******* | **  | ****** | ******* |
+-----------+------+-------+-------+--------+-------+-------+------+---------+-----+--------+---------+

Those measures are from young collections performed and it is visible that the young collection time do not last so long.

368.077 -> 368.178 (101 ms) 318.084 -> 318.135 (51 ms)

Also I've measured full gc times

+-----------+------+------+-------+-------+--------+-------+-------+---------+-----+--------+---------+
| Timestamp |  S0  |  S1  |   E   |   O   |    M   |  CCS  |  YGC  |   YGCT  | FGC |  FGCT  |   GCT   |
+-----------+------+------+-------+-------+--------+-------+-------+---------+-----+--------+---------+
| *******   | **** | **** | ****  | ***** | *****  | ***** | ****  | ******* | **  | ****** | ******* |
| 91879.8   | 0.00 | 7.51 | 23.57 | 68.12 | 96.92  | 95.11 |  4437 | 372.348 | 65  | 22.432 | 394.780 |
| 91881.8   | 6.58 | 0.00 | 8.25  |  9.51 |  96.92 | 95.12 |  4438 | 372.465 | 66  | 23.066 | 395.531 |
+-----------+------+------+-------+-------+--------+-------+-------+---------+-----+--------+---------+

22.432 -> 23.066 (634 ms) I believed that the measure of full gc does not implies a stop of the world pause for the whole duration

Also another thing that I've done is to have Jstat running in one tab and a log tracking the value of N (concurrent request) in other tab. And as I spected, everytime that a young or full gc is trigger, the N goes up a lot.

So, after this preface... My question is.

Is there any good way to limit the concurrency capacity of an application where the gc pauses takes long than the average response time?

Is also worth to mention that the gc pauses that we have are not a problem for our round trip request time. In order words, there are not a problem for the clients and it is not my intention to start a discussion about how they can be improved or if CMS is deprecated, etc.

Thanks in advance!

Kaan
  • 5,434
  • 3
  • 19
  • 41

1 Answers1

1

My first thought is to look at other garbage collectors since there have been many improvements since CMS, but you stated you aren't looking to explore that... ;-)

There are various aspects of CMS behavior that are tunable and might help your situation. One is incremental mode which you can enable with: -XX:+CMSIncrementalMode.

From Concurrent Mark Sweep (CMS) Collector:

Normally, the CMS collector uses one or more processors during the entire concurrent tracing phase, without voluntarily relinquishing them. Similarly, one processor is used for the entire concurrent sweep phase, again without relinquishing it. This overhead can be too much of a disruption for applications with response time constraints that might otherwise have used the processing cores, particularly when run on systems with just one or two processors. Incremental mode solves this problem by breaking up the concurrent phases into short bursts of activity, which are scheduled to occur midway between minor pauses.

There are also some knobs to turn around "duty cycle" (in same doc linked above) that might help – for example: -XX:+CMSIncrementalPacing -XX:CMSIncrementalDutyCycleMin=0 -XX:CMSIncrementalDutyCycle=10 – but they're less straightforward, you would need to spend some time testing and observing.

Kaan
  • 5,434
  • 3
  • 19
  • 41
  • Hi @kaan, thanks for your answer! A migration of the garbage collector itself is something that it may improve this behaviour. But I'm trying to figure it out if it will be possible to improve my current use case in the perspective of concurrency limiting the application. Some algorithm that may react different to spikes (no matter if the spike is given due to a GC Pause of a request burst). The open source project that I've based my implementation on uses AIMD to change the maxInflightRequest. – Martin Locurcio Dec 14 '19 at 16:18
  • *(no matter if the spike is given due to a GC Pause OR a request burst) – Martin Locurcio Dec 14 '19 at 16:24