0

TL;DR

How to safely await on function execution (takes str and int as arguments and doesn't require any other context) in a separate process?

Long story

I have aiohtto.web web API that uses Boost.Python wrapper for C++ extension, run under gunicorn (and I plan to deploy it on Heroku), tested by locust.

About extension: it have just one function that does non-blocking operation - takes one string (and one integer for timeout management), does some calculations with it and returns a new string. And for every input string, it is only one possible output (except timeout, but in that case, C++ exception must be raised and translated by Boost.Python to a Python-compatible one).

In short, a handler for specific URL executes the code below:

res = await loop.run_in_executor(executor, func, *args)

where executor is the ProcessPoolExecutor instance, and func -function from C++ extension module. (in the real project, this code is in the coroutine method of the class, and func - it's classmethod that only executes C++ function and returns the result)

Error catching

When a new request arrives, I extract it's POST data by request.post() and then storing it's data to the instance of the custom class named Call (because I have no idea how to name it in another way). So that call object contains all input data (string), request receiving time and unique id that comes with the request.

Then it proceeds to class named Handler (not the aiohttp request handler), that passes it's input to another class' method with loop.run_in_executor inside. But Handler has a logging system that works like a middleware - reads id and receiving time of every incoming call object and logging it with a message that tells you either it just starting to execute, successfully executed or get in trouble. Also, Handler have try/except and stores all errors inside the call object, so that logging middleware knows what error occurred, or what output extension had returned

Testing

I have the unit test that just creates 256 coroutines with this code inside and executor that have 256 workers and it works well.

But when testing with Locust here comes a problem. I use 4 Gunicorn workers and 4 executor workers for this kind of testing. At some time application just starts to return wrong output.

My Locust's TaskSet is configured to log every fault response with all available information: output string, error string, input string (that was returned by the application too), id. All simulated requests are the same, but id is unique for every.

The situation is better when setting Gunicorn's max_requests option to 100 requests, but failures still come.

Interesting thing is, that sometimes I can trigger "wrong output" period by simply stopping and starting Locust's test.

I need a 100% guarantee that my web API works as I expect.

UPDATE & solution

Just asked my teammate to review the C++ code - the problem was in global variables. In some way, it wasn't a problem for 256 parallel coroutines, but for Gunicorn was.

Illia Ananich
  • 353
  • 4
  • 16
  • I don't know why this is tagged [tag:locust]. Basically it sounds like your web service doesn't scale? – enderland Jul 25 '17 at 20:02
  • @enderland scalabling isn't so necneccesary getting expected output from the web API. I have added note about Locust's test in the last edit. – Illia Ananich Jul 25 '17 at 20:29
  • Did you log results of run_in_executor to be sure problem is C-code related? It's not clear what "wrong output" app returns. May be server just can't handle this amount of parallel requests regardless of executor stuff. – Mikhail Gerasimov Jul 28 '17 at 02:23
  • @GerasimovMikhail C++ code do calculations, so for example, if it does arithmetic calculations than for '2 + 2' input string it must return '4' every. Locust test do something like sending hundreds of requests with '2 + 2', so I expect that every response must be '4', but it doesn't. I have updated the question with information about logging and kinda solution. Should I rename the question? – Illia Ananich Jul 28 '17 at 07:13

0 Answers0