Here is a "rate limiter" that can be used without requiring the use of a decorator and therefore you can just create two separate rate limiters and use on or the other depending on the method being called. An instance of this class can be used across multiple threads since internally it uses a lock to keep its internal state consistent. You can also use a managed version of this class to be used across multiple processes.
First the class:
from multiprocessing.managers import BaseManager
from collections import deque
from threading import Lock
import time
class RateLimiter:
def __init__(self, call_count, period=1.0):
self._call_count = int(call_count)
self._period = float(period)
self._called_timestamps = deque()
self._lock = Lock()
def throttle(self):
with self._lock:
while True:
now = time.monotonic()
while self._called_timestamps:
time_left = self._called_timestamps[0] + self._period - now
if time_left >= 0:
break
self._called_timestamps.popleft()
if len(self._called_timestamps) < self._call_count:
break
time.sleep(time_left)
self._called_timestamps.append(now)
# A "managed" RateLimiter is required for use with multiprocessing:
class RateLimiterManager(BaseManager):
pass
RateLimiterManager.register('RateLimiter', RateLimiter)
And then your code would be modified as follows:
get_rate_limiter = RateLimiter(5, 1.0)
put_post_delete_rate_limiter = RateLimiter(1, 10.0)
def callrest(method, url, data):
rate_limiter = get_rate_limiter if method == 'GET' else put_post_delete_rate_limiter
rate_limiter.throttle()
...
General Note About Rate Limiters
Suppose you have a function foo
that calls some web service but cannot exceed 2 calls per second just as an example and you were able to use the PYPI
ratelimit
package. Then your code would look something like the following:
@sleep_and_retry
@limits(calls=2, period=1)
def foo():
do_some_calculations()
call_web_service()
do_some_more_calculations()
The rate limiter will (at least it should) ensure that foo
cannot be called more than twice in any two-second interval. But invoking foo
is not the same thing as calling the web service since some time will elapse between the invocation of foo
and the actual calling of the web service. The problem is that this elapsed time is, in principle, variable. This means that we can't be 100% sure that the actual web service is not being invoked a third time in some given 2-second window.
Now the web service may very well have built into its rules some degree of tolerance to handle this possibility. If not, it seems to me that one would want to err on the side of caution and perhaps use a slightly larger time interval. For example we might want to use:
@sleep_and_retry
@limits(calls=2, period=1.1)
def foo():
do_some_calculations()
call_web_service()
do_some_more_calculations()
This is why I consider it is preferable to put the throttling closer to the actual web service call (as is possible with my solution) and not use a decorator on the enclosing function since this should reduce the variability in timing to some degree.
I welcome any comments on this issue.
Example Using Multprocessing
Here is how you would use the RateLimiter
class if callrest
were a multiprocessing worker function:
from collections import deque
from threading import Lock
import time
class RateLimiter:
... # class code omitted for brevity
def init_pool_processes(*args):
global get_rate_limiter
global put_post_delete_rate_limiter
get_rate_limiter, put_post_delete_rate_limiter = args
def callrest(method, url, data):
rate_limiter = get_rate_limiter if method == 'GET' else put_post_delete_rate_limiter
rate_limiter.throttle()
return time.time()
# A "managed" RateLimiter is required for use with multiprocessing:
from multiprocessing.managers import BaseManager
class RateLimiterManager(BaseManager):
pass
if __name__ == '__main__':
from multiprocessing import Pool
RateLimiterManager.register('RateLimiter', RateLimiter)
with RateLimiterManager() as manager:
get_rate_limiter = manager.RateLimiter(5, 1.0)
put_post_delete_rate_limiter = manager.RateLimiter(1, 10.0)
pool = Pool(10, initializer=init_pool_processes, initargs=(get_rate_limiter, put_post_delete_rate_limiter))
results = [pool.apply_async(callrest, args=('GET', None, None)) for _ in range(10)]
for result in results:
print(result.get())
pool.close()
pool.join()
Prints:
1680525645.944291
1680525645.9452908
1680525645.9452908
1680525645.9452908
1680525645.9452908
1680525646.960845
1680525646.960845
1680525646.9618495
1680525646.9648433
1680525646.9658465