Slow access to Django's request.body

Question

Sometimes this line of Django app (hosted using Apache/mod_wsgi) takes a lot of time to execute (eg. 99% of eg. 6 seconds of request handling, as measured by New Relic), when submitted by some mobile clients:

raw_body = request.body

(where request is an incoming request)

The questions I have:

What could have slowed down access to request.body so much?
What would be the correct configuration for Apache to wait before invoking Django until client sends whole payload? Maybe the problem is in Apache configuration.

Django's body attribute in HttpRequest is a property, so that really resolves on what is really being done there and how to make it happen outside of the Django app, if possible. I want Apache to wait for full request before sending it to Django app.

score 10 · Answer 1 · edited May 23 '17 at 12:32

10

Regarding (1), Apache passes control to the mod_wsgi handler as soon as the request's headers are available, and mod_wsgi then passes control on to Python. The internal implementation of request.body then calls the read() method which eventually calls the implementation within mod_wsgi, which requests the request's body from Apache and, if it hasn't been completely received by Apache yet, blocks until it is available.

Regarding (2), this is not possible with mod_wsgi alone. At least, the hook processing incoming requests doesn't provide a mechanism to block until the full request is available. Another poster suggested to use nginx as a proxy in a response to this duplicate question.

edited May 23 '17 at 12:32

Community

1
1

answered Jun 04 '14 at 14:10

Phillip

13,448
29
41

(1) Unfortunately, I was suspecting that (the alternative was some really strange stuff happening for the server), so thanks for pointing to exact place in source code. (2) Reverse proxy may be a nice idea, thanks for pointing that out. Could you tell if nginx is capable of withholding request also in case it is not being used only as proxy, but instead has uWSGI configured to host Django app? I suppose yes. – Tadeck Jun 04 '14 at 22:11
I am not an expert but AFAIK: nginx calls the body `$request_body` internally, and how it is read is at least in parts [configurable](http://nginx.org/en/docs/http/ngx_http_core_module.html#client_body_buffer_size). The variable's documentation states that it is [passed on to uwsgi](http://nginx.org/en/docs/http/ngx_http_core_module.html#var_request_body), which sounds to me as if pre-buffering is always enabled. I don't know how much of the request is buffered though. [read_request_body](http://wiki.nginx.org/HttpEchoModule#echo_read_request_body) might help here. – Phillip Jun 05 '14 at 08:47
Perhaps setting the [uWSGI `post-buffering` option](http://uwsgi-docs.readthedocs.org/en/latest/Options.html#post-buffering) could help here? Have uWSGI buffer the request body before it invokes the app itself. – Martijn Pieters Jun 08 '14 at 00:03

score 9 · Accepted Answer · edited May 23 '17 at 12:32

There are two ways you can fix this in Apache.

You can use mod_buffer, available in >=2.3, and change BufferSize to the maximum expected payload size. This should make Apache hold the request in memory until it's either finished sending, or the buffer is reached.

For older Apache versions < 2.3, you can use mod_proxy combined with ProxyIOBufferSize, ProxyReceiveBufferSize and a loopback vhost. This involves putting your real vhost on a loopback interface, and exposing a proxy vhost which connects back to the real vhost. The downside to this is that it uses twice as many sockets, and can make resource calculation difficult.

However, the most ideal choice would be to enable request/response buffering at your L4/L7 load balancer. For example, haproxy lets you add rules based on req_len and same goes for nginx. Most good commercial load balancers also have an option to buffer requests before sending.

All three approaches rely on buffering the full request/response payload, and there are performance considerations depending on your use case and available resources. You could cache the entire payload in memory but this may dramatically decrease your maximum concurrent connections. You could choose to write the payload to local storage (preferably SSD), but you are then limited by IO capacity.

You also need to consider file uploads, because these are not a good fit for memory based payload buffering. In most cases, you would handle upload requests in your webserver, for example HttpUploadModule, then query nginx for the upload progress, rather than handling it directly in WSGI. If you are buffering at your load balancer, then you may wish to exclude file uploads from the buffering rules.

You need to understand why this is happening, and that this problem exists both when sending a response and receiving a request. It's also a good idea to have these protections in place, not just for scalability, but for security reasons.

Cool answer! I am upvoting and accepting, even though [the other answer](http://stackoverflow.com/a/24039774/548696) is also nice. Will verify both answers and change selection if necessary. Thanks! — Tadeck, Jun 11 '14 at 01:46
No problem at all, if you need any further clarification just let me know :) — SleepyCal, Jun 11 '14 at 11:59

Davide · Answer 3 · 2014-06-06T21:10:27.920

I'm afraid the problem could be in the amount of data you are transferring and possibly a slow connection. Also note that upload bandwidth is typically much less than download bandwidth.

As already pointed out, when you use request.body Django will wait for the whole body to be fully transferred from the client and available in-memory (or on disk, according to configurations and size) on the server.

I would suggest you to try what happens with the same request if the client is connected to a WiFi access point which is wired to the server itself, and see if it improves grately. If this is not possible, perhaps just run a tool like speedtest.net on the client, get the request size and do the math to see how much time it would require theoretically (I'd expect the mesured time to be more or less 20% more). Be careful that network speed is often mesured in bits per second, while file size is mesured in Bytes.

In some cases, if a lot of processing is needed on the data, it may be convinient to read() the request and do computations on-the-go, or perhaps directly pass the request object to any function that can read from a so-called "file-like object" instead of a string.

In your specific case, however, I'm afraid this would only affect that 1% of time that is not spent in receiving the body from the network.

Edit:

Sorry, ony now I've noticed the extra description in the bounty. I'm afraid I can't help you but, may I ask, what is the point? I'd guess this would only save a tiny bit of server resources for keeping a python thread idle for a while, without any noticable performance gain on the request...

To answer your question on "what's the point", read my accepted answer. — SleepyCal, Jun 12 '14 at 10:46

score 0 · Answer 4 · answered Jun 09 '14 at 10:23

Looking at the Django source, it looks like what actually happens when you call request.body is the the request body is loaded into memory by being read from a stream.

https://github.com/django/django/blob/stable/1.4.x/django/http/init.py#L390-L392

It's likely that if the request is large the time being taken is actually just loading it into memory. Django has methods on the request to handle acting on the body as a stream, which depending on what exactly the content being consumed is could allow you to process the request more efficiently.

https://docs.djangoproject.com/en/dev/ref/request-response/#django.http.HttpRequest.read

You could for example read one line at a time.

The payload sometimes has 150k bytes and takes 8 seconds to load. Yes, accessing `request.body` reads the request content. Unfortunately, I need it all at once anyway, and I am searching for a way request to be fully received before passing to Django app. I think http://stackoverflow.com/a/24039774/548696 is the answer closest to what I need. — Tadeck, Jun 09 '14 at 21:57

Slow access to Django's request.body

4 Answers4