[Django]-Query syntax to select exactly one item for each category

3👍

Your question isn’t entirely clear: since you didn’t say otherwise, I’m going to assume that you don’t care which item is selected for each category, just that you need any one. If that isn’t the case, please update the question to clarify.

tl;dr version: there is no documented
way to explicitly use GROUP BY
statements in Django, except by using
a raw query. See the bottom for code to do so.

The problem is that in doing what you’re looking for in SQL itself requires a bit of a hack. You can easily try this example with by entering sqlite3 :memory: at the command line:

CREATE TABLE category
(
  id INT
);

CREATE TABLE item
(
  id INT,
  category_id INT
);

INSERT INTO category VALUES (1);
INSERT INTO category VALUES (2);
INSERT INTO category VALUES (3);


INSERT INTO item VALUES (1,1);
INSERT INTO item VALUES (2,2);
INSERT INTO item VALUES (3,3);
INSERT INTO item VALUES (4,1);
INSERT INTO item VALUES (5,2);

SELECT id, category_id, COUNT(category_id) FROM item GROUP BY category_id;

returns

4|1|2
5|2|2
3|3|1

Which is what you’re looking for (one item id for each category id), albeit with an extraneous COUNT. The count (or some other aggregate function) is needed in order to apply the GROUP BY.

Note: this will ignore categories that don’t contain any items, which seems like sensible behaviour.

Now the question becomes, how to do this in Django?

The obvious answer is to use Django’s aggregation/annotation support, in particular, combining annotate with values as is recommend elsewhere to GROUP queries in Django.

Reading those posts, it would seem we could accomplish what we’re looking for with

Item.objects.values(‘id’).annotate(unneeded_count=Count(‘category_id’))

However this doesn’t work. What Django does here is not just GROUP BY "category_id", but groups by all fields selected (ie GROUP BY "id", "category_id")1. I don’t believe there is a way (in the public API, at least) to change this behaviour.

The solution is to fall back to raw SQL:

qs = Item.objects.raw('SELECT *, COUNT(category_id) FROM myapp_item GROUP BY category_id')

1: Note that you can inspect what queries Django is running with:

from django.db import connection
print connection.queries[-1]

Edit:

There are a number of other possible approaches, but most have (possibly severe) performance problems. Here are a couple:

1. Select an item from each category.

items = []
for c in Category.objects.all():
    items.append(c.item_set[0])

This is a more clear and flexible approach, but has the obvious disadvantage of requiring many more database hits.

2. Use select_related

items = Item.objects.select_related()

and then do the grouping/filtering yourself (in Python).

Again, this is perhaps more clear than using raw SQL and only requires one query, but this one query could be very large (it will return all items and their categories) and doing the grouping/filtering yourself is probably less efficient than letting the database do it for you.

Leave a comment