0

According to the SQS documentation, the maximum number of inflight messages is set to 120,000 for standard queues. However, sometimes I see my queues maxing out at lower numbers, such as here: Max inflight messages at 100K

Does anyone know why this might be the case? I have code that dynamically changes the number of SQS listeners depending on the number of messages in the queue, but I don't want to do anything if I've hit the maximum. My problem is now that the max limit doesn't seem to be consistent. Some queues go to 120K, but this one is stuck at 100K instead, and as far as I can tell there is no setting that allows me to set this limit.

mtk
  • 13,221
  • 16
  • 72
  • 112
howcheng
  • 2,211
  • 2
  • 17
  • 24
  • 1
    You really shouldn't ever have 120K messages in flight under normal conditions. This is a protective control, and should normally be a sign that something is seriously wrong, unless, of course, you actually have 12,000 consumers each processing 10 messages at a time, or you're doing something highly unusual with SQS. Normally, no more than #-consumers × #-messages they each request (1-10) should be in flight unless you are not handling errors properly. – Michael - sqlbot Mar 30 '18 at 00:43
  • _WHY_ do you require so many 'in flight' messages? How are you consuming and using the messages? – John Rotenstein Mar 30 '18 at 07:18
  • @JohnRotenstein My app is like a postmaster: it downloads messages in bulk and blasts them out to web farms to do the actual work. I have apps dumping hundreds of thousands of messages at a time into the queues. If I have a small # of clients, it will take really long time to process thru all of them. This architecture allows other dev teams in the company to just build APIs and not worry about queue management. So it's not unusual for us to hit the max inflight count at any time. – howcheng Mar 30 '18 at 16:24
  • @Michael-sqlbot Yes, we actually can have 12,000 consumers processing 10 messages at a time. – howcheng Mar 30 '18 at 16:25
  • @howcheng... okay, just checking. I have no insight into the architecture of SQS, but if I were to engage in some rampant numerological speculation based on the roundness of 100,000 and the strangeness of 120,000, the magic numbers 3 (availability zones) and 2 (redundant systems) we might conclude that an SQS queue has 6 nodes (2 each x 3 AZs) capable of 20K concurrent messages in-flight and your queue may be experiencing a transient capacity issue due to a problem in one of the 6 nodes. It seems unlikely that I am right, but the numbers are suspicious. How long has this been going on? – Michael - sqlbot Mar 30 '18 at 18:59
  • @Michael-sqlbot It was just something I had noticed. I don't know if it's consistent or what, but your guess certainly sounds plausible. – howcheng Mar 30 '18 at 20:49

1 Answers1

0

approximateNumberOfMessagesNotVisible indicates the number of messages in-flight, as you are rightly said. It depends on how many consumers you have, and what is througput of each consumer.

If the actual number is caping at 100k, then your consumers are swamped and have no more receiving capacity.

Anyways, it's better if you provide more info on the use-case as 100k in-flight messages look out of ordinary and you may be not using correct solution for your problem.

mtk
  • 13,221
  • 16
  • 72
  • 112