I have a standard socketserver
that looks a bit like:
import time
import socketserver
import threading
import io
class Handler(socketserver.StreamRequestHandler):
def handle(self):
return cache.handle(self.rfile, self.wfile)
class Cache:
def __init__(self):
self._runner = Runner(self.reload)
self._runner.start()
self.cache = {}
def reload(self):
# very long process that takes up to 90 minutes and 100 GB of RAM
# involves calls to pyodbc and some python processing
# here is a dummy process
cache = {}
for i in range(90):
cache[str(i).encode()] = bytes(10**9)
time.sleep(60)
self.cache = cache
@staticmethod
def _send_bytes(wfile: io.BufferedIOBase, msg: bytes) -> None:
wfile.write(len(msg).to_bytes(4, "big"))
wfile.write(msg)
def handle(self, rflie, wfile):
request = rfile.readline()
response = self.cache.get(request, b'')
self._send_bytes(wfile, response)
class Runner(threading.Thread):
"""class that runs a process every timer seconds (1 day by default)"""
def __init__(self, proc, timer=24*60*60):
super().__init__(target=self._target)
self.proc = proc
self.event = threading.Event()
self.timer = timer
def _target(self):
while not self.event.wait(self.timer):
self.proc()
if __name__ == '__main__':
cache = Cache()
with socketserver.TCPServer(("0.0.0.0", 48888), Handler) as server:
server.socket.settimeout(30)
server.serve_forever()
Here's the problem, every time the reload runs in it's own thread, the server becomes exceptionally slow at responding to requests (of which it will get a few a minute). In fact it will become near unresponsive (the clients will timeout) and the whole system will get into a state where every request takes so long to respond that a backlog of clients will build up that never get answered. The server just stops handling requests (but the loader still runs every 24 hrs).
My understanding of python is that even though the other loader thread takes time, it shouldn't be holding the GIL like this, but that seems to be what is happening.
Edit:
This is (roughly) what the reload
method above calls:
import collections
from typing import List, Any, Dict
import pyodbc
CONN_STR = "foo"
QUERY_STR = "bar"
def load_all() -> Dict[bytes, List[ValueObject]]:
# this step takes ~10 minutes
# it connects to a sql server db using pyodbc
conn: pyodbc.Connection
with pyodbc.connect(CONN_STR) as conn:
cursor: pyodbc.Cursor = conn.execute(QUERY_STR)
rows: List[Any] = cursor.fetchall()
# this step occurs entirely in pure python (a small amount gets delegated to pandas)
# it is almost entirely building objects/dicts/lists using information from the rows
# it makes no external calls or performs any IO (other than logging)
# it takes ~ 40 minutes normally
d = {}
for row in rows:
value = ValueObject.build(row)
d[value.key] = value
results = collections.defaultdict(list)
for k, v in d.items():
results[grouper_func(k)].append(v)
return results