Distributed system design for quota on API

Question

I am designing an API which can be hit only a defined number of times based on the subscription plan. Below are the plans per account:

10M hits per year - $100
100M hits per year - $300
1G hits per year - $600

I have this service running in multiple regions (say 5) and the system is distributed. I need to send a notification if the user exhausts their quota.

What could be the optimal system design to achieve this. I'm looking for what kind of DB to use? How to replicate this data across multiple zones handling heavy concurrency?

score 1 · Answer 1 · answered Mar 22 '22 at 17:14

The very first question to ask - how hard is the limit? From business point of view. For example, if a customer with 10M quota goes over by 1% - is it a problem?

Second feature to look for - TPS - what is traffic's pattern? For example, 1G of requests evenly distributed leads to ~32 requests per second. TPS is important since that may be your bottleneck - especially when do cross region calls.

Third feature to look for is how available your system should be?

In either way you look for a counter - on every request you reduce the counter, and when the counter goes to zero, then you stop all processes.

These counters could be implemented in several ways.

For example, create a queue with given number of tokens and to process a request, servers have to read a token from the queue; no tokens left - no service.

Another option is to have a service which will issue allowance to every service in batches - in this case your resource servers ask for quota and then report back usage.

In either way - it is quite challenging to have "exactly once" processing. There are many different failure modes and that may lead to some tokens being either lost or double spent.

The last part I would like to dedicate to some logical steps:

A request from a customer arrives to a server for processing
Can the server make a local decision on quota? If yes: it means that the server has some part of quota and that needs to be somehow updated. Otherwise, server has to ask another service for quota
Server will ask a service - may I process another request? This ask may travel in the same region or in the other region. Are you ok with intra region request (latency and availability risk) - if yes - go for it. If no:
The quota service has to be regionalized. How will this service shard quota across regions? Maybe split quota and exchange updates periodically (e.g. every second).

And so on. Always picking simplicity.

Personally, I would go with quota service and deploy it into every region and add there a synchronization flow to make sure no tokens are wasted.

That's interesting design to put tokens on the queue. The requirement has a hard stop as this number corresponds to billing the customer. Additionally, the count cannot be out of sync if we are replicating tokens in multiple regions. Could we use a new quota service with a NoSQL DB like Cassandra to achieve this maintaining the quota count for each account? Would this hit on the latency if the server and the DB is deployed in different regions? — Vikas Adyar, Mar 23 '22 at 05:15
As soon as you typed "hard stop" - this rules out all systems without strong consistency (like cassandra). After this you have to keep strong consistency and the challenge you run into is scalability and availability. If you are ok with a cross region call (higher latency and availability risk) - go for it. Otherwise have region local services to issue tickets and have a separate one to keep them all in sync. — AndrewR, Mar 23 '22 at 16:41
Also a comment about a hard stop - is every request expensive? In case a customer does go over quota - is this something you really need to bill for? or this is just "a cost of doing business". E.g. if every request costs 1 cent - 10M of requests is 100.000k - is a customer goes over by 1000 requests - it's just extra 10$ for your business to take - is it a problem? The reason I keep getting to this topic is that I see many teams to get locked on requirements without exploring actual business impact. Usually it will be cheaper to under bill some customers, but keep operating a simpler system. — AndrewR, Mar 23 '22 at 16:43

Distributed system design for quota on API

1 Answers1