72
The SIGKILL your worker received was initiated by another process. Your supervisord config looks fine, and the killasgroup would only affect a supervisor initiated kill (e.g. the ctl or a plugin) – and without that setting it would have sent the signal to the dispatcher anyway, not the child.
Most likely you have a memory leak and the OS’s oomkiller is assassinating your process for bad behavior.
grep oom /var/log/messages
. If you see messages, that’s your problem.
If you don’t find anything, try running the periodic process manually in a shell:
MyPeriodicTask().run()
And see what happens. I’d monitor system and process metrics from top in another terminal, if you don’t have good instrumentation like cactus, ganglia, etc for this host.
10
One sees this kind of error when an asynchronous task (through celery) or the script you are using is storing a lot of data in memory because it leaks.
In my case, I was getting data from another system and saving it on a variable, so I could export all data (into Django model / Excel file) after finishing the process.
Here is the catch. My script was gathering 10 Million data; it was leaking memory while I was gathering data. This resulted in the raised Exception.
To overcome the issue, I divided 10 million pieces of data into 20 parts (half a million on each part). I stored the data in my own preferred local file / Django model every time the length of data reached 500,000 items. I repeated this for every batch of 500k items.
No need to do the exact number of partitions. It is the idea of solving a complex problem by splitting it into multiple subproblems and solving the subproblems one by one.
- [Django]-Django URL Redirect
- [Django]-Storing an Integer Array in a Django Database
- [Django]-Django TemplateSyntaxError – 'staticfiles' is not a registered tag library