[Answered ]-Using annotate and distinct(field) together in Django

1👍

This will be a bit badly readable because of your related names. I would suggest to change Review.user.related_name to reviews, it will make this much more understandable, but I’ve elaborated on that in the second part of the answer.

With your current setup, I managed to do it fully in the DB using subqueries:

from django.db.models import Subquery, OuterRef, Count

# No DB Queries
best_reviews_per_user = Review.objects.all()\
    .annotate(num_likes=Count('likereview_review'))\
    .order_by('-num_likes')\
    .filter(user=OuterRef('id'))

# No DB Queries
review_sq = Subquery(best_reviews_per_user.values('review_id')[:1])

# First DB Query
best_review_ids = User.objects.all()\
    .annotate(best_review_id=review_sq)\
    .values_list('best_review_id', flat=True)

# Second DB Query
best_reviews = Review.objects.all()\
    .annotate(num_likes=Count('likereview_review'))\
    .order_by('-num_likes')\
    .filter(review_id__in=best_review_ids)\
    .exclude(num_likes=0)  # I assume this is the case


# Print it
for review in best_reviews:
    print(review, review.num_likes, review.user)

# Test it
assert len({review.user for review in best_reviews}) == len(best_reviews)
assert sorted([r.num_likes for r in best_reviews], reverse=True) == [r.num_likes for r in best_reviews]
assert all([r.num_likes for r in best_reviews])

Let’s try with this completely equivalent model structure:

from django.db import models
from django.utils import timezone


class TimestampedModel(models.Model):
    """This makes your life much easier and is pretty DRY"""
    created = models.DateTimeField(default=timezone.now)
    class Meta:
        abstract = True


class Review(TimestampedModel):
    user = models.ForeignKey(User, on_delete=models.CASCADE, related_name='reviews', db_index=True)
    text = models.TextField(max_length=5000)
    rating = models.SmallIntegerField()
    likes = models.ManyToManyField(User, through='ReviewLike')


class ReviewLike(TimestampedModel):
    user = models.ForeignKey(User, on_delete=models.CASCADE, db_index=True)
    review = models.ForeignKey(Review, on_delete=models.CASCADE, db_index=True)

The likes are a clear m2m relationship between reviews and users, with an extra timestamp column – it’s a model use for a Through model. Docs here.

Now everything is imho much much easier to read.

from django.db.models import OuterRef, Count, Subquery


# No DB Queries
best_reviews = Review.objects.all()\
    .annotate(like_count=Count('likes'))\
    .exclude(like_count=0)\
    .order_by('-like_count')\

# No DB Queries
sq = Subquery(best_reviews.filter(user=OuterRef('id')).values('id')[:1])

# First DB Query
user_distinct_best_review_ids = User.objects.all()\
    .annotate(best_review=sq)\
    .values_list('best_review', flat=True)

# Second DB Query
best_reviews = best_reviews.filter(id__in=user_distinct_best_review_ids).all()

0👍

One way of doing it is as follows:

  1. Get a list of tuples that represent the user.id and review.id, ordered by user and number of likes ASCENDING
  2. Convert the list to a dict to remove duplicate user.ids. Later items replace earlier ones, which is why the ordering in step 1 is important
  3. Create a list of review.ids from the values in the dict
  4. Get a queryset using the list of review.ids, ordered by the number of likes DESCENDING
from django.db.models import Count

user_review_list = Review.objects\
    .annotate(num_likes=Count('likereview_review'))\
    .order_by('user', 'num_likes')\
    .values_list('user', 'pk')

user_review_dict = dict(user_review_list)
review_pk_list = list(user_review_dict.values())

reviews = Review.objects\
    .annotate(num_likes=Count('likereview_review'))\
    .filter(pk__in=review_pk_list)\
    .order_by('-num_likes')

Leave a comment