4👍
What is the problem?
As others have mentioned in the comments, the problem is that the functions that are getting called via the post_save
are taking a long time. (Remember that signals are not async!! – this is a common misconception).
I’m not familiar with django-river
but taking a quick look at the functions that will get called post-save (see here and here) we can see that they involve additional calls to the database.
Whilst you save a lot of individual db hits by using bulk_create
you are still doing calling the database again multiple times for each post_save signal.
What can be done about it?
In short. Not much!! For the vast majority of django requests, the slow part will be calling the database. This is why we try and minimise the number of calls to the db (using things like bulk_create
).
Reading through the first few paragraphs of django-river
the whole idea is to move things that would normally be in code to the database. The big advantage here is that you don’t need to re-write code and re-deploy so often. But the disadvantage is that you’re inevitably going to have to refer to the database more, which is going to slow things down. This will be fine for some use-cases, but not all.
There are two things I can think of which might help:
- Does all of this currently happen as part of the request/response cycle. And if it is, does it need to be? If the answers to these two questions are ‘yes’ and ‘no’ respectively, then you could move this work to a separate task queue. This will still be slow, but at least it won’t slow down your site.
- Depending on exactly what your workflows are and the nature of the data you are creating, it might be the case that you can do everything that the
post_save
signals are doing in your own function, and do it more efficiently. But this will definitely depend upon your data, and your app, and will move away from the philosophy ofdjango-river
.
1👍
Use a separated worker if the "signal" logic allows you to be executed after the bulk save.
You can create an additional queue table and put the metadata about what to do for your future worker.
Create a separated worker (Django module) with needed logic and data from the queue table. You can do it as management command, this will allow you to run the worker in the main flow (you can run management commands from regular Django code) or you can run it by crontab based on a schedule.
How to run such a worker?
If you need something to be done as closely as you’ve created records – run it in a separate thread using the threading
module. So your request-response lifecycle will be done right after you’ve started a new thread.
Else if you can do it later – make a schedule and run it by crontab using the management command framework.
- [Django]-Django: ValueError: invalid literal for int() with base 10:
- [Django]-Alias for app URI in Django project urls.py