[Django]-Django 2.1+ bulk update records with the count of their related records?

6👍

You need a subquery for the count (rather than just the countable objects), which is a bit of a hassle as regular aggregate queries like count() or aggregate() tend to execute immediately, whereas subqueries need to be lazy. That’s why the annotate() workaround below is needed:

from django.db.models import Subquery, OuterRef, Count

table_b_subquery = Subquery(TableB.objects
    .filter(a_id=OuterRef('id'))
    .values('a_id')
    .annotate(cnt=Count('a_id'))
    .values('cnt')
)

TableA.objects.update(table_b_count=table_b_subquery)

3👍

Endre’s answer is what I’m looking for! I just found a new feature in Django 2.2 that might be worth using though and I’m conflicted. I’m not sure which is better.

Django 2.2 has bulk_update

https://docs.djangoproject.com/en/2.2/ref/models/querysets/#django.db.models.query.QuerySet.bulk_update

So with bulk_update for my question, I’d do:

records = []
for record in TableA.objects.all():
  counted_record = record.table_b_count = record.table_b_set.count()
  records.append(counted_record)

TableA.objects.bulk_update(records, ['table_b_count'], batch_size=100000) 

# This is acting on Table A with 1,000,000 rows and Table B with 5,000,000 rows.

Anyone have thoughts on whether Endre’s answer or the new Django 2.2 method is better in this case, with millions of rows?

Leave a comment