67đź‘Ť
It’s also possible to create a subclass of Subquery
, that changes the SQL it outputs. For instance, you can use:
class SQCount(Subquery):
template = "(SELECT count(*) FROM (%(subquery)s) _count)"
output_field = models.IntegerField()
You then use this as you would the original Subquery
class:
spaces = Space.objects.filter(carpark=OuterRef('pk')).values('pk')
Carpark.objects.annotate(space_count=SQCount(spaces))
You can use this trick (at least in postgres) with a range of aggregating functions: I often use it to build up an array of values, or sum them.
70đź‘Ť
Shazaam! Per my edits, an additional column was being output from my subquery. This was to facilitate ordering (which just isn’t required in a COUNT).
I just needed to remove the prescribed meta-order from the model. You can do this by just adding an empty .order_by()
to the subquery. In my code terms that meant:
from django.db.models import Count, OuterRef, Subquery
spaces = Space.objects.filter(carpark=OuterRef('pk')).order_by().values('carpark')
count_spaces = spaces.annotate(c=Count('*')).values('c')
Carpark.objects.annotate(space_count=Subquery(count_spaces))
And that works. Superbly. So annoying.
- [Django]-Saving ModelForm error(User_Message could not be created because the data didn't validate)
- [Django]-Converting Django QuerySet to pandas DataFrame
- [Django]-Equivalent of PHP "echo something; exit();" with Python/Django?
20đź‘Ť
Problem
The problem is that Django adds GROUP BY
as soon as it sees using an aggregate function.
Solution
So you can just create your own aggregate function but so that Django thinks it is not aggregate. Just like this:
total_comments = Comment.objects.filter(
post=OuterRef('pk')
).order_by().annotate(
total=Func(F('length'), function='SUM')
).values('total')
Post.objects.filter(length__gt=Subquery(total_comments))
This way you get the SQL query like this:
SELECT "testapp_post"."id", "testapp_post"."length"
FROM "testapp_post"
WHERE "testapp_post"."length" > (SELECT SUM(U0."length") AS "total"
FROM "testapp_comment" U0
WHERE U0."post_id" = "testapp_post"."id")
So you can even use aggregate subqueries in aggregate functions.
Example
You can count the number of workdays between two dates, excluding weekends and holidays, and aggregate and summarize them by employee:
class NonWorkDay(models.Model):
date = DateField()
class WorkPeriod(models.Model):
employee = models.ForeignKey(User, on_delete=models.CASCADE)
start_date = DateField()
end_date = DateField()
number_of_non_work_days = NonWorkDay.objects.filter(
date__gte=OuterRef('start_date'),
date__lte=OuterRef('end_date'),
).annotate(
cnt=Func('id', function='COUNT')
).values('cnt')
WorkPeriod.objects.values('employee').order_by().annotate(
number_of_word_days=Sum(F('end_date__year') - F('start_date__year') - number_of_non_work_days)
)
Hope this will help!
- [Django]-Django gunicorn sock file not created by wsgi
- [Django]-Debugging Apache/Django/WSGI Bad Request (400) Error
- [Django]-How can I list urlpatterns (endpoints) on Django?
18đź‘Ť
I just bumped into a VERY similar case, where I had to get seat reservations for events where the reservation status is not cancelled. After trying to figure the problem out for hours, here’s what I’ve seen as the root cause of the problem:
Preface: this is MariaDB, Django 1.11.
When you annotate a query, it gets a GROUP BY
clause with the fields you select (basically what’s in your values()
query selection). After investigating with the MariaDB command line tool why I’m getting NULL
s or None
s on the query results, I’ve came to the conclusion that the GROUP BY
clause will cause the COUNT()
to return NULL
s.
Then, I started diving into the QuerySet
interface to see how can I manually, forcibly remove the GROUP BY
from the DB queries, and came up with the following code:
from django.db.models.fields import PositiveIntegerField
reserved_seats_qs = SeatReservation.objects.filter(
performance=OuterRef(name='pk'), status__in=TAKEN_TYPES
).values('id').annotate(
count=Count('id')).values('count')
# Query workaround: remove GROUP BY from subquery. Test this
# vigorously!
reserved_seats_qs.query.group_by = []
performances_qs = Performance.objects.annotate(
reserved_seats=Subquery(
queryset=reserved_seats_qs,
output_field=PositiveIntegerField()))
print(performances_qs[0].reserved_seats)
So basically, you have to manually remove/update the group_by
field on the subquery’s queryset in order for it to not have a GROUP BY
appended on it on execution time. Also, you’ll have to specify what output field the subquery will have, as it seems that Django fails to recognize it automatically, and raises exceptions on the first evaluation of the queryset. Interestingly, the second evaluation succeeds without it.
I believe this is a Django bug, or an inefficiency in subqueries. I’ll create a bug report about it.
Edit: the bug report is here.
- [Django]-Accessing dictionary by key in Django template
- [Django]-Celery. Decrease number of processes
- [Django]-Pycharm error Django is not importable in this environment
8đź‘Ť
A solution which would work for any general aggregation could be implemented using Window
classes from Django 2.0. I have added this to the Django tracker ticket as well.
This allows the aggregation of annotated values by calculating the aggregate over partitions based on the outer query model (in the GROUP BY clause), then annotating that data to every row in the subquery queryset. The subquery can then use the aggregated data from the first row returned and ignore the other rows.
Performance.objects.annotate(
reserved_seats=Subquery(
SeatReservation.objects.filter(
performance=OuterRef(name='pk'),
status__in=TAKEN_TYPES,
).annotate(
reserved_seat_count=Window(
expression=Count('pk'),
partition_by=[F('performance')]
),
).values('reserved_seat_count')[:1],
output_field=FloatField()
)
)
- [Django]-Is virtualenv recommended for django production server?
- [Django]-Using IntellijIdea within an existing virtualenv
- [Django]-Django: Safely Remove Old Migrations?
5đź‘Ť
If I understand correctly, you are trying to count Space
s available in a Carpark
. Subquery seems overkill for this, the good old annotate alone should do the trick:
Carpark.objects.annotate(Count('spaces'))
This will include a spaces__count
value in your results.
OK, I have seen your note…
I was also able to run your same query with other models I had at hand. The results are the same, so the query in your example seems to be OK (tested with Django 1.11b1):
activities = Activity.objects.filter(event=OuterRef('pk')).values('event')
count_activities = activities.annotate(c=Count('*')).values('c')
Event.objects.annotate(spaces__count=Subquery(count_activities))
Maybe your “simplest real-world example” is too simple… can you share the models or other information?
- [Django]-Django models: mutual references between two classes and impossibility to use forward declaration in python
- [Django]-How can I call a custom Django manage.py command directly from a test driver?
- [Django]-How to get a particular attribute from queryset in Django in view?
2đź‘Ť
“works for me” doesn’t help very much. But.
I tried your example on some models I had handy (the Book -> Author
type), it works fine for me in django 1.11b1.
Are you sure you’re running this in the right version of Django? Is this the actual code you’re running? Are you actually testing this not on carpark
but some more complex model?
Maybe try to print(thequery.query)
to see what SQL it’s trying to run in the database. Below is what I got with my models (edited to fit your question):
SELECT (SELECT COUNT(U0."id") AS "c"
FROM "carparks_spaces" U0
WHERE U0."carpark_id" = ("carparks_carpark"."id")
GROUP BY U0."carpark_id") AS "space_count" FROM "carparks_carpark"
Not really an answer, but hopefully it helps.
- [Django]-Django's ManyToMany Relationship with Additional Fields
- [Django]-Manually logging in a user without password
- [Django]-Django connection to postgres by docker-compose