[Fixed]-Slow access to Django's request.body

9đź‘Ť

âś…

There are two ways you can fix this in Apache.

You can use mod_buffer, available in >=2.3, and change BufferSize to the maximum expected payload size. This should make Apache hold the request in memory until it’s either finished sending, or the buffer is reached.

For older Apache versions < 2.3, you can use mod_proxy combined with ProxyIOBufferSize, ProxyReceiveBufferSize and a loopback vhost. This involves putting your real vhost on a loopback interface, and exposing a proxy vhost which connects back to the real vhost. The downside to this is that it uses twice as many sockets, and can make resource calculation difficult.

However, the most ideal choice would be to enable request/response buffering at your L4/L7 load balancer. For example, haproxy lets you add rules based on req_len and same goes for nginx. Most good commercial load balancers also have an option to buffer requests before sending.

All three approaches rely on buffering the full request/response payload, and there are performance considerations depending on your use case and available resources. You could cache the entire payload in memory but this may dramatically decrease your maximum concurrent connections. You could choose to write the payload to local storage (preferably SSD), but you are then limited by IO capacity.

You also need to consider file uploads, because these are not a good fit for memory based payload buffering. In most cases, you would handle upload requests in your webserver, for example HttpUploadModule, then query nginx for the upload progress, rather than handling it directly in WSGI. If you are buffering at your load balancer, then you may wish to exclude file uploads from the buffering rules.

You need to understand why this is happening, and that this problem exists both when sending a response and receiving a request. It’s also a good idea to have these protections in place, not just for scalability, but for security reasons.

👤SleepyCal

10đź‘Ť

Regarding (1), Apache passes control to the mod_wsgi handler as soon as the request’s headers are available, and mod_wsgi then passes control on to Python. The internal implementation of request.body then calls the read() method which eventually calls the implementation within mod_wsgi, which requests the request’s body from Apache and, if it hasn’t been completely received by Apache yet, blocks until it is available.

Regarding (2), this is not possible with mod_wsgi alone. At least, the hook processing incoming requests doesn’t provide a mechanism to block until the full request is available. Another poster suggested to use nginx as a proxy in a response to this duplicate question.

👤Phillip

0đź‘Ť

I’m afraid the problem could be in the amount of data you are transferring and possibly a slow connection. Also note that upload bandwidth is typically much less than download bandwidth.

As already pointed out, when you use request.body Django will wait for the whole body to be fully transferred from the client and available in-memory (or on disk, according to configurations and size) on the server.

I would suggest you to try what happens with the same request if the client is connected to a WiFi access point which is wired to the server itself, and see if it improves grately. If this is not possible, perhaps just run a tool like speedtest.net on the client, get the request size and do the math to see how much time it would require theoretically (I’d expect the mesured time to be more or less 20% more). Be careful that network speed is often mesured in bits per second, while file size is mesured in Bytes.

In some cases, if a lot of processing is needed on the data, it may be convinient to read() the request and do computations on-the-go, or perhaps directly pass the request object to any function that can read from a so-called “file-like object” instead of a string.

In your specific case, however, I’m afraid this would only affect that 1% of time that is not spent in receiving the body from the network.

Edit:

Sorry, ony now I’ve noticed the extra description in the bounty. I’m afraid I can’t help you but, may I ask, what is the point? I’d guess this would only save a tiny bit of server resources for keeping a python thread idle for a while, without any noticable performance gain on the request…

👤Davide

0đź‘Ť

Looking at the Django source, it looks like what actually happens when you call request.body is the the request body is loaded into memory by being read from a stream.

https://github.com/django/django/blob/stable/1.4.x/django/http/init.py#L390-L392

It’s likely that if the request is large the time being taken is actually just loading it into memory. Django has methods on the request to handle acting on the body as a stream, which depending on what exactly the content being consumed is could allow you to process the request more efficiently.

https://docs.djangoproject.com/en/dev/ref/request-response/#django.http.HttpRequest.read

You could for example read one line at a time.

👤Woodham

Leave a comment