[Fixed]-Optimise Django subquery (currently using `annotate` and `Count`)

1đź‘Ť

One possibly solution is to use a correlated subquery:

duplicates = Relationship.objects.exclude(
    id=OuterRef('id')
).filter(source=OuterRef('source'))

Relationship.objects.annotate(
    duplicated=Exists(duplicates)
).filter(
    duplicated=True
)

What this does is build up a queryset (that can’t be evaluated on it’s own) that will only contain elements that are duplicates of the main queryset’s “source” value. It then filters the elements so only those are selected.

It’s not currently possible to do this without a .annotate(...).filter(...) in django, however sometimes that has performance implications: being able to evaluate it only one in the database (in the WHERE clause) can make big improvements to performance.

Leave a comment