[Django]-Multiple annotate Sum terms yields inflated answer

17👍

The bug is reported here but it’s not solved yet even in Django 1.11. The issue is related to joining two tables in reverse relations.
Notice that distinct parameter works well for Count but not for Sum. So you can use a trick and write an ORM like below:

 Projects.objects.annotate(
      temp_tasks_duration=Sum('task__duration'),
      temp_subtasks_duration=Sum('task__subtask__duration'),
      tasks_count=Count('task'),
      tasks_count_distinct=Count('task', distinct=True),
      task_subtasks_count=Count('task__subtask'),
      task_subtasks_count_distinct=Count('task__subtask', distinct=True),
 ).annotate(
      tasks_duration=F('temp_tasks_duration')*F('tasks_count_distinct')/F('tasks_count'),
      subtasks_duration=F('temp_subtasks_duration')*F('subtasks_count_distinct')/F('subtasks_count'),
 )

Update:
I found that you need to use Subquery. In the following solution, firstly you filter tasks for related to the outerref (OuterRef references to the outer query, so the tasks are filtered for each Project), then you group the tasks by ‘project’, so that the Sum applies on all the tasks of each projects and returns just one result if any task exists for the project (you have filtered by ‘project’ and then grouped by that same field; That’s why just one group can be there.) or None otherwise. The result would be None if the project has no task, that means we can not use [0] to select the calculated sum.

from django.db.models import Subquery, OuterRef
Projects.objects.annotate(
    tasks_duration=Subquery(
        Task.objects.filter(
            project=OuterRef('pk')
        ).values(
            'project'
        ).annotate(
            the_sum=Sum('task__duration'),
        ).values('the_sum')[:1]
    ),
    subtasks_duration=Sum('task__subtask__duration')
)

Running this code will send just one query to the database, so the performance is great.

1👍

I get this error as well. Exact same code. It works if I do the aggregation separately, but once I try to get both sums at the same time, one of them gets a factor 2 higher, and the other a factor 3.

I have no idea why Django behaves this way. I have filed a bug report here:
https://code.djangoproject.com/ticket/19011
You might be interested in following it as well.

👤tBuLi

Leave a comment