15đź‘Ť
Because median isn’t a SQL aggregate. See, for example, the list of PostgreSQL aggregate functions and the list of MySQL aggregate functions.
27đź‘Ť
Here’s your missing function. Pass it a queryset and the name of the column that you want to find the median for:
def median_value(queryset, term):
count = queryset.count()
return queryset.values_list(term, flat=True).order_by(term)[int(round(count/2))]
That wasn’t as hard as some of the other responses seem to indicate. The important thing is to let the db sorting do all of the work, so if you have the column already indexed, this is a super cheap operation.
(update 1/28/2016)
If you want to be more strict about the definition of median for an even number of items, this will average together the value of the two middle values.
def median_value(queryset, term):
count = queryset.count()
values = queryset.values_list(term, flat=True).order_by(term)
if count % 2 == 1:
return values[int(round(count/2))]
else:
return sum(values[count/2-1:count/2+1])/Decimal(2.0)
- Django. Error message for login form
- Django template keyword `choice_value` in no longer work in 1.11
- Django conditional annotation
7đź‘Ť
Well, the reason is probably that you need to track all the numbers to calculate median. Avg, Count, Max, Min, StDev, Sum, and Variance can all be calculated with constant storage needs. That is, once you “record” a number you’ll never need it again.
FWIW, the variables you need to track are: min, max, count, <n>
= avg, <n^2>
= avg of the square of the values.
- Casting from base Model instance to derived proxy Model in Django?
- SQL injection hacks and django
- How can I use AWS's Dynamo Db with Django?
- Enable PK based filtering in Django Graphene Relay while retaining Global IDs
2đź‘Ť
A strong possibility is that median is not part of standard SQL.
Also, it requires a sort, making it quite expensive to compute.
- Reverse() argument after ** must be a mapping
- X-editable inline editing in Django – how to get CSRF protection?
- Django DEBUG=False still runs in debug mode
- Force delete of any previous test database (autoclobber) when running Django unit tests, eg, in PyCharm
2đź‘Ť
I have no idea what db backend you are using, but if your db supports another aggregate, or you can find a clever way of doing it, You can probably access it easily by Aggregate.
- Celery – No module named five
- Django: values_list() multiple fields concatenated
- Django template rows of multiple items
1đź‘Ť
FWIW, you can extend PostgreSQL 8.4 and above to have a median aggregate function with these code snippets.
Other code snippets (which work for older versions of PostgreSQL) are shown here. Be sure to read the comments for this resource.
- Django – Dictionary update sequence element #0 has length 1; 2 is required
- Django extract string from [ErrorDetail(string='Test Message', code='invalid')]
0đź‘Ť
I have implemented the following version now as I also got some weird results with the above implementation and rounding behaviour.
Using Python 3.11.4
def median_value(queryset, column_name):
count = queryset.count()
values = queryset.values_list(column_name, flat=True).order_by(column_name)
if count == 0:
return Decimal("0")
if count % 2 == 1:
return values[int(count/2)]
else:
return sum(values[int(count/2)-1:int(count/2)+1])/Decimal(2.0)
- AWS Elastic Beanstalk Container Commands Failing
- Django media url is not resolved in 500 internal server error template
- Faster alternative to manage.py collectstatic (w/ s3boto storage backend) to sync static files to s3?