36👍
I’m answering my own question, perhaps someone has a better solution.
Reading gunicorn’s documentation a bit further, and reading a bit more about eventlet and gevent, I think that gunicorn answers my question perfectly. Gunicorn has a master process that manages a pool of workers. Each worker can be either synchronous (single threaded, handling one request at a time) or asynchronous (each worker actually handles multiple requests almost simultaneously).
Synchronous workers are very simple to understand and to debug, and if a worker fails, only one request is lost. But if a worker is stuck in a long running external API call, it is basically sleeping. So in case of high load, all workers might end up sleeping while waiting for results, and requests will end up being dropped.
So the solution is to change the default worker type from synchronous to asynchronous (choosing eventlet or gevent, here’s a comparison). Now each worker runs multiple green threads, each of which is extremely lightweight. Whenever one thread has to wait for some I/O, another green thread resumes execution. This is called cooperative multitasking. It’s very fast, and very lightweight (a single worker could handle thousands of concurrent requests, if they are waiting for I/O). Exactly what I need.
I was wondering how I should change my existing code, but apparently the standard python modules are monkey-patched by gunicorn upon startup (actually by eventlet or gevent) so all existing code can run without change and still behave nicely with other threads.
There are a bunch of parameters which can be tweaked in gunicorn, for example the maximum number of simultaneous clients using gunicorn’s worker_connections
parameter, the maximum number of pending connections using the backlog
parameter, etc.
This is just great, I’ll start testing right away!