3👍
Have you considdered Celery or some other standard Queue/Broker messaging system? django-celery is quite mature and well developed and specifically designed for this kind of task.
Django -> Celery --> Worker Process (always running)
^ |-> Worker Process
| `-> Worker Process -,
\______ Job Complete _____/
The basic idea is that you have always-running worker processes(on 1 or more servers) waiting for messages to come in. (these can be pickled objects or json or whatever you want). These processes idle waiting for Celery (and it’s RabbitMQ backend) to dish out a message/job to them. Once the message/job is processed, notification comes back through the broker and calls a callback in django in which you update status.
Celery is a task queue/job queue based on distributed message passing. It is focused on real-time operation, but supports scheduling as well.
The execution units, called tasks, are executed concurrently on a single or more worker servers. Tasks can execute asynchronously (in the background) or synchronously (wait until ready).
2👍
For this kind of messaging I like to use a real language-agnostic message queueing system that can be used over and over again in future projects. Look at AMQP if you can handle having a message-queue broker in the middle managing all the queues. Or if you don’t want a 3rd party broker then look at ZeroMQ.
In both cases you can send messages using a sub-pub queue that can handle several workers per queue if needed. The messages can either be simple text strings http://tnetstrings.org/ or they can be JSON objects or, you might even be able to send pickled Python objects along with code to execute if you are careful. Personally I like using JSON objects (a subset of JSON) and unpack them into Python dicts in order to use them.
I have used both AMQP and ZeroMQ in systems with around 20 communicating Python processes. It works well, and if you need to connect to something non-Python, you will find that there is already an AMQP module and a ZeroMQ library out there.
An interesting extension of your scenario is to have 3 kinds of worker processes written in Jython, CPython and IronPython. That way you can leverage 3rd party Java and .NET modules as well as binary CPython modules like lxml. Combine it with something like Redis so that the processes are completely decoupled and can run on multiple servers if necessary. The workers would put their results into Redis instead of gumming up the message queuing system with big messages interleaved with small ones. If necessary a worker can publish a message containing a Redis key so that another process can retrieve the value.
- [Django]-Django raw sql with datetime
- [Django]-Pass success_url to the activate
- [Django]-Append extra data to request.POST in Django
- [Django]-Iterative find/replace from a list of tuples in Python
-1👍
Pickling is fine, use cPickle for efficiency. However you should not write it to a file. Rather use some other IPC mechanisms, like sockets or pipes (i.e. see https://stackoverflow.com/search?q=python+named+pipes) that avoid the disk-overhead.
- [Django]-How do you add similar posts to a detail page using django
- [Django]-Why does the Django Atom1Feed use atom:updated instead of atom:published?