[Django]-Django- Group by and Count by unique together

1👍

You can do this,

from django.db.models import Count

result = PostView.objects.values(
    "created__hour",
    "post",
    "user"
).annotate(count=Count("id"))

print(list(result))

# Result
# [{'created__hour': 17, 'post': 1, 'user': 1, 'count': 4}, {'created__hour': 17, 'post': 2, 'user': 1, 'count': 3}]
👤JPG

1👍

Short Answer SQL & Django

select a.day_hour, count(*) from (select strftime('%Y-%m-%d %H', created) as day_hour, 
user_id, count(*)  from post_postview 
where post_id=1 group by strftime('%Y-%m-%d %H', created), user_id) 
a group by a.day_hour

Django Answer

In [140]: rs = PostView.objects.filter(post_id=1).extra(
{'date_hour': u"strftime('%%Y-%%m-%%d %%H', created)"}).order_by('date_hour').values('user_id', 'date_hour').annotate(count=Count('user_id', distinct=True))

In [141]: rs
Out[141]: <QuerySet [{'date_hour': '2021-05-28 10', 
'user_id': 2, 'count': 1}, {'date_hour': '2021-05-28 10', 
'user_id': 3, 'count': 1}, {'date_hour': '2021-05-28 11', 
'user_id': 2, 'count': 1}, {'date_hour': '2021-05-28 11', 
'user_id': 3, 'count': 1}]>

In [142]: rs.values('date_hour').distinct()
Out[142]: <QuerySet [{'date_hour': '2021-05-28 10'}, 
{'date_hour': '2021-05-28 11'}]>

You need to group by twice. First time on date_hour and user_id and second time on the existing result set on date_hour.

Long Answer:

Since the query is on two levels(date level and unique user) you need two queries.

In the first step, you group the post created by post_hour. Without this fundamental aggregation result will show wrong values.

db.sqlite3> select strftime('%Y-%m-%d %H', created) as 
day_hour, user_id, count(*)  from post_postview where 
post_id=1 group by strftime('%Y-%m-%d %H', created), user_id
+---------------+---------+----------+
| day_hour      | user_id | count(*) |
+---------------+---------+----------+
| 2021-05-28 10 | 2       | 1        |
| 2021-05-28 10 | 3       | 2        |
| 2021-05-28 11 | 2       | 3        |
| 2021-05-28 11 | 3       | 2        |
+---------------+---------+----------+

As you can see for the same time interval(2021-05-28 10),
there are 2 rows. Now to count this two rows, the extra query is required.

Again applying the same group by day_hour, we get the results per hour.

select a.day_hour, count(*) from (select strftime('%Y-%m-%d 
%H', created) as day_hour, user_id, count(*)  from 
post_postview where post_id=1 group by strftime('%Y-%m-%d 
%H', created), user_id) a group by a.day_hour;

+---------------+----------+
| day_hour      | count(*) |
+---------------+----------+
| 2021-05-28 10 | 2        |
| 2021-05-28 11 | 2        |
+---------------+----------+

Here I have used SQLite specific strftime which is important piece.

The same code is ported to Django as

In [145]: 
PostView.objects.filter(post_id=1).extra({'date_hour': 
u"strftime('%%Y-%%m-%%d %%H', 
created)"}).order_by('date_hour').values('user_id', 
'date_hour').values('date_hour').distinct()
Out[145]: <QuerySet [{'date_hour': '2021-05-28 10'}, 
{'date_hour': '2021-05-28 11'}]>

extra methods let’s you inject SQL specific function and results afterwards follow general Django order_by and distinct. SQLite doesn’t support distinct on.

Leave a comment