[Django]-Celery: WorkerLostError: Worker exited prematurely: signal 9 (SIGKILL)

72👍

The SIGKILL your worker received was initiated by another process. Your supervisord config looks fine, and the killasgroup would only affect a supervisor initiated kill (e.g. the ctl or a plugin) – and without that setting it would have sent the signal to the dispatcher anyway, not the child.

Most likely you have a memory leak and the OS’s oomkiller is assassinating your process for bad behavior.

grep oom /var/log/messages. If you see messages, that’s your problem.

If you don’t find anything, try running the periodic process manually in a shell:

MyPeriodicTask().run()

And see what happens. I’d monitor system and process metrics from top in another terminal, if you don’t have good instrumentation like cactus, ganglia, etc for this host.

10👍

One sees this kind of error when an asynchronous task (through celery) or the script you are using is storing a lot of data in memory because it leaks.

In my case, I was getting data from another system and saving it on a variable, so I could export all data (into Django model / Excel file) after finishing the process.

Here is the catch. My script was gathering 10 Million data; it was leaking memory while I was gathering data. This resulted in the raised Exception.

To overcome the issue, I divided 10 million pieces of data into 20 parts (half a million on each part). I stored the data in my own preferred local file / Django model every time the length of data reached 500,000 items. I repeated this for every batch of 500k items.

No need to do the exact number of partitions. It is the idea of solving a complex problem by splitting it into multiple subproblems and solving the subproblems one by one. 😀

Leave a comment