148👍
Choosing between len()
and count()
depends on the situation and it’s worth to deeply understand how they work to use them correctly.
Let me provide you with a few scenarios:
- (most crucial) When you only want to know the number of elements and you do not plan to process them in any way it’s crucial to use
count()
:
DO: queryset.count()
– this will perform single SELECT COUNT(*) FROM some_table
query, all computation is carried on RDBMS side, Python just needs to retrieve the result number with fixed cost of O(1)
DON’T: len(queryset)
– this will perform SELECT * FROM some_table
query, fetching whole table O(N) and requiring additional O(N) memory for storing it. This is the worst that can be done
- When you intend to fetch the queryset anyway it’s slightly better to use
len()
which won’t cause an extra database query ascount()
would
len()
(one db query)
len(queryset) # SELECT * fetching all the data - NO extra cost - data would be fetched anyway in the for loop
for obj in queryset: # data is already fetched by len() - using cache
pass
count()
(two db queries!):
queryset.count() # First db query SELECT COUNT(*)
for obj in queryset: # Second db query (fetching data) SELECT *
pass
-
Reverted 2nd case (when queryset has already been fetched):
for obj in queryset: # iteration fetches the data len(queryset) # using already cached data - O(1) no extra cost queryset.count() # using cache - O(1) no extra db query len(queryset) # the same O(1) queryset.count() # the same: no query, O(1)
Everything will be clear once you take a glance "under the hood":
class QuerySet(object):
def __init__(self, model=None, query=None, using=None, hints=None):
# (...)
self._result_cache = None
def __len__(self):
self._fetch_all()
return len(self._result_cache)
def _fetch_all(self):
if self._result_cache is None:
self._result_cache = list(self.iterator())
if self._prefetch_related_lookups and not self._prefetch_done:
self._prefetch_related_objects()
def count(self):
if self._result_cache is not None:
return len(self._result_cache)
return self.query.get_count(using=self.db)
Good references in Django docs:
168👍
Although the Django docs recommend using count
rather than len
:
Note: Don’t use
len()
on QuerySets if all you want to do is determine the number of records in the set. It’s much more efficient to handle a count at the database level, using SQL’sSELECT COUNT(*)
, and Django provides acount()
method for precisely this reason.
Since you are iterating this QuerySet anyway, the result will be cached (unless you are using iterator
), and so it will be preferable to use len
, since this avoids hitting the database again, and also the possibly of retrieving a different number of results!).
If you are using iterator
, then I would suggest including a counting variable as you iterate through (rather than using count) for the same reasons.
- [Django]-Get list item dynamically in django templates
- [Django]-How to check Django version
- [Django]-How can I get MINIO access and secret key?
29👍
I think using len(qs)
makes more sense here as you need to iterate over the results. qs.count()
is a better option if all that you want to do it print the count and not iterate over the results.
len(qs)
will hit the database with select * from table
whereas qs.count()
will hit the db with select count(*) from table
.
also qs.count()
will give return integer and you cannot iterate over it
- [Django]-Filter Queryset on empty ImageField
- [Django]-Django form: what is the best way to modify posted data before validating?
- [Django]-Make the first letter uppercase inside a django template
6👍
For people who prefer test measurements(Postresql):
If we have a simple Person model and 1000 instances of it:
class Person(models.Model):
name = models.CharField(max_length=100)
age = models.SmallIntegerField()
def __str__(self):
return self.name
In average case it gives:
In [1]: persons = Person.objects.all()
In [2]: %timeit len(persons)
325 ns ± 3.09 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [3]: %timeit persons.count()
170 ns ± 0.572 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
So how can you see count()
almost 2x faster than len()
in this particular test case.
- [Django]-Cannot access django app through ip address while accessing it through localhost
- [Django]-What is "load url from future" in Django
- [Django]-Django – Website Home Page
3👍
Summarizing what others have already answered:
len()
will fetch all the records and iterate over them.count()
will perform an SQL COUNT operation (much faster when dealing with big queryset).
It is also true that if after this operation, the whole queryset will be iterated, then as as whole it could be slightly more efficient to use len()
.
However
In some cases, for instance when having memory limitations, it could be convenient (when posible) to split the operation performed over the records.
That can be achieved using django pagination.
Then, using count()
would be the choice and you could avoid to have to fetch the entire queryset at once.
- [Django]-South migration: "database backend does not accept 0 as a value for AutoField" (mysql)
- [Django]-How can I test https connections with Django as easily as I can non-https connections using 'runserver'?
- [Django]-Django import error – No module named core.management
1👍
I experimented how fast model and raw queries can get 10 million rows with count() and len(). *I used PostgreSQL.
<Result of the experiment>
count() | len() | |
---|---|---|
Model query | 1.02 seconds | 46.13 seconds |
Raw query | 0.48 seconds | 3.16 seconds |
So, the order in fast speed is below:
count()
with raw query (0.48 seconds)count()
with model query (1.02 seconds)len()
with raw query (3.16 seconds)len()
with model query (46.13 seconds)
I recommend to basically use count()
with model query because it’s faster than len()
, less code and more convenient than raw query but when using select_for_update(), you should use len()
with with model query because select_for_update()
with count()
doesn’t work and is less code and more convenient than raw query.
<How to experiment>
First, I created Test
model with only id
column:
# "store/models.py"
from django.db import models
class Test(models.Model):
pass
Then, ran the command below:
python manage.py makemigrations && python manage.py migrate
Then, inserted 10 million rows to store_test
table with psql at once:
postgres=# INSERT INTO store_test (id) SELECT generate_series(1, 10000000);
INSERT 0 10000000
Time: 29929.337 ms (00:29.929)
Lastly, I ran test_view()
as shown below:
# "store/views.py"
from time import time
from .models import Test
from django.db import connection
from django.http import HttpResponse
def test_view(request):
# "count()" with model query
start = time()
print(Test.objects.all().count(), "- count() - Model query")
end = time()
print(round(end - start, 2), "seconds\n")
# "len()" with model query
start = time()
print(len(Test.objects.all()), "- len() - Model query")
end = time()
print(round(end - start, 2), "seconds\n")
# "count()" with raw query
start = time()
with connection.cursor() as cursor:
cursor.execute("SELECT count(*) FROM store_test;")
print(cursor.fetchone()[0], "- count() - Raw query")
end = time()
print(round(end - start, 2), "seconds\n")
# "len()" with raw query
start = time()
with connection.cursor() as cursor:
cursor.execute("SELECT * FROM store_test;")
print(len(cursor.fetchall()), "- len() - Raw query")
end = time()
print(round(end - start, 2), "seconds\n")
return HttpResponse("Test_view")
Output on console:
10000000 - count() - Model query
1.02 seconds
10000000 - len() - Model query
46.13 seconds
10000000 - count() - Raw query
0.48 seconds
10000000 - len() - Raw query
3.16 seconds
[18/Dec/2022 07:12:14] "GET /store/test_view/ HTTP/1.1" 200 9
- [Django]-Itertools.groupby in a django template
- [Django]-Django Rest Framework – Updating a foreign key
- [Django]-How do I POST with jQuery/Ajax in Django?