712👍
Your understanding is mostly correct:
-
select_related
: when the object that you’re going to be selecting is a single object, soOneToOneField
or aForeignKey
-
prefetch_related
: when you’re going to get a "set" of things, soManyToManyField
s as you stated or reverseForeignKey
s.
Just to clarify what I mean by reverse ForeignKey
s, here’s an example:
class ModelA(models.Model):
pass
class ModelB(models.Model):
a = ForeignKey(ModelA)
# Forward ForeignKey relationship
ModelB.objects.select_related('a').all()
# Reverse ForeignKey relationship
ModelA.objects.prefetch_related('modelb_set').all()
The difference is that:
select_related
does an SQL join and therefore gets the results back as part of the table from the SQL serverprefetch_related
on the other hand executes another query and therefore reduces the redundant columns in the original object (ModelA
in the above example)
You may use prefetch_related
for anything that you can use select_related
for.
The tradeoffs are that prefetch_related
has to create and send a list of IDs to select back to the server, this can take a while. I’m not sure if there’s a nice way of doing this in a transaction, but my understanding is that Django always just sends a list and says SELECT … WHERE pk IN (…,…,…) basically. In this case if the prefetched data is sparse (let’s say U.S. State objects linked to people’s addresses) this can be very good, however if it’s closer to one-to-one, this can waste a lot of communications. If in doubt, try both and see which performs better.
Everything discussed above is basically about the communications with the database. On the Python side however prefetch_related
has the extra benefit that a single object is used to represent each object in the database. With select_related
duplicate objects will be created in Python for each "parent" object. Since objects in Python have a decent bit of memory overhead this can also be a consideration.
41👍
Gone through the already posted answers. Just thought it would be better if I add an answer with actual example.
Let’ say you have 3 Django models which are related.
class M1(models.Model):
name = models.CharField(max_length=10)
class M2(models.Model):
name = models.CharField(max_length=10)
select_relation = models.ForeignKey(M1, on_delete=models.CASCADE)
prefetch_relation = models.ManyToManyField(to='M3')
class M3(models.Model):
name = models.CharField(max_length=10)
Here you can query M2
model and its relative M1
objects using select_relation
field and M3
objects using prefetch_relation
field.
However as we’ve mentioned M1
‘s relation from M2
is a ForeignKey
, it just returns only 1 record for any M2
object. Same thing applies for OneToOneField
as well.
But M3
‘s relation from M2
is a ManyToManyField
which might return any number of M1
objects.
Consider a case where you have 2 M2
objects m21
, m22
who have same 5 associated M3
objects with IDs 1,2,3,4,5
. When you fetch associated M3
objects for each of those M2
objects, if you use select related, this is how it’s going to work.
Steps:
- Find
m21
object. - Query all the
M3
objects related tom21
object whose IDs are1,2,3,4,5
. - Repeat same thing for
m22
object and all otherM2
objects.
As we have same 1,2,3,4,5
IDs for both m21
, m22
objects, if we use select_related option, it’s going to query the DB twice for the same IDs which were already fetched.
Instead if you use prefetch_related, when you try to get M2
objects, it will make a note of all the IDs that your objects returned (Note: only the IDs) while querying M2
table and as last step, Django is going to make a query to M3
table with the set of all IDs that your M2
objects have returned. and join them to M2
objects using Python instead of database.
This way you’re querying all the M3
objects only once which improves performance as python joins are cheaper than database joins.
- [Django]-Django-Bower + Foundation 5 + SASS, How to configure?
- [Django]-How to use pdb.set_trace() in a Django unittest?
- [Django]-Django – How to pass several arguments to the url template tag
38👍
Both methods achieve the same purpose, to forego unnecessary db queries. But they use different approaches for efficiency.
The only reason to use either of these methods is when a single large query is preferable to many small queries. Django uses the large query to create models in memory preemptively rather than performing on demand queries against the database.
select_related
performs a join with each lookup, but extends the select to include the columns of all joined tables. However this approach has a caveat.
Joins have the potential to multiply the number of rows in a query. When you perform a join over a foreign key or one-to-one field, the number of rows won’t increase. However, many-to-many joins do not have this guarantee. So, Django restricts select_related
to relations that won’t unexpectedly result in a massive join.
The "join in python" for prefetch_related
is a little more alarming than it should be. It creates a separate query for each table to be joined. It filters each of these table with a WHERE IN clause, like:
SELECT "credential"."id",
"credential"."uuid",
"credential"."identity_id"
FROM "credential"
WHERE "credential"."identity_id" IN
(84706, 48746, 871441, 84713, 76492, 84621, 51472);
Rather than performing a single join with potentially too many rows, each table is split into a separate query.
- [Django]-Default value for user ForeignKey with Django admin
- [Django]-How to access Enum types in Django templates
- [Django]-"<Message: title>" needs to have a value for field "id" before this many-to-many relationship can be used.
20👍
select_related():
-
can reduce multiple
SELECT
queries to only 1SELECT
query with Forward Foreign Key and Reverse Foreign Key in one-to-one relationship and with Forward Foreign Key in one-to-many and many-to-many relationships. -
cannot be used with Reverse Foreign Key in one-to-many and many-to-many relationships.
prefetch_related():
- can reduce multiple
SELECT
queries to 2SELECT
queries as a minimum with Forward Foreign Key and Reverse Foreign Key in one-to-one, one-to-many and many-to-many relationships.
*You can see my answer explaining the meaning of Forward Foreign Key and Reverse Foreign Key.
The following shows my experiments of select_related()
and prefetch_related()
with Forward Foreign Key and Reverse Foreign Key in one-to-one, one-to-many and many-to-many relationships.
<One-to-one relationship>
For example, there are Person
and PersonDetail
models which have one-to-one relationship as shown below:
# "app/models.py
from django.db import models
class Person(models.Model):
name = models.CharField(max_length=20)
def __str__(self):
return self.name
class PersonDetail(models.Model):
person = models.OneToOneField(Person, on_delete=models.CASCADE)
age = models.IntegerField()
gender = models.CharField(max_length=20)
def __str__(self):
return str(self.age) + " " + self.gender
And, there are Person
and PersonDetail
admins as shown below:
# "app/admin.py
from django.contrib import admin
from .models import Person, PersonDetail
class PersonDetailInline(admin.TabularInline):
model = PersonDetail
@admin.register(Person)
class PersonAdmin(admin.ModelAdmin):
inlines = (PersonDetailInline,)
list_display = ('id', 'name')
ordering = ('id',)
@admin.register(PersonDetail)
class PersonDetailAdmin(admin.ModelAdmin):
list_display = ('id', 'age', 'gender', 'person')
ordering = ('id',)
Then, Person
admin has 5 objects as shown below:
And, PersonDetail
admin has 5 objects as shown below:
<Forward Foreign Key>
Then, I iterate Person
model from PersonDetail
model as shown below:
for obj in PersonDetail.objects.all():
print(obj.person)
Then, these below are outputted on console:
John
David
Lisa
Kai
Anna
Then, 6 SELECT
queries are run as shown below. *I use PostgreSQL and these below are the query logs of PostgreSQL and you can see my answer explaining how to enable and disable the query logs on PostgreSQL:
Next, I iterate Person
model from PersonDetail
model with select_related("person")
as shown below. *The order of select_related()
and all()
don’t matter:
for obj in PersonDetail.objects.select_related("person").all():
print(obj.person)
Then, these below are outputted on console:
John
David
Lisa
Kai
Anna
Then, 1 SELECT
query is run as shown below:
Next, I iterate Person
model from PersonDetail
model with prefetch_related("person")
as shown below. *The order of prefetch_related()
and all()
don’t matter:
for obj in PersonDetail.objects.prefetch_related("person").all():
print(obj.person)
Then, these below are outputted on console:
John
David
Lisa
Kai
Anna
Then, 2 SELECT
queries are run as shown below:
<Reverse Foreign Key>
Next, I iterate PersonDetail
model from Person
model as shown below:
for obj in Person.objects.all():
print(obj.persondetail)
Then, these below are outputted on console:
32 Male
18 Male
26 Female
36 Male
21 Female
Then, 6 SELECT
queries are run as shown below:
Next, I iterate PersonDetail
model from Person
model with select_related("persondetail")
as shown below:
for obj in Person.objects.select_related("persondetail").all():
print(obj.persondetail)
Then, these below are outputted on console:
32 Male
18 Male
26 Female
36 Male
21 Female
Then, 1 SELECT
query is run as shown below:
Next, I iterate PersonDetail
model from Person
model with prefetch_related("persondetail")
as shown below:
for obj in Person.objects.prefetch_related("persondetail").all():
print(obj.persondetail)
Then, these below are outputted on console:
32 Male
18 Male
26 Female
36 Male
21 Female
Then, 2 SELECT
queries are run as shown below:
<One-to-many relationship>
For example, there are Category
and Product
models which have one-to-many relationship as shown below:
# "app/models.py"
from django.db import models
class Category(models.Model):
name = models.CharField(max_length=20)
def __str__(self):
return self.name
class Product(models.Model):
category = models.ForeignKey(Category, on_delete=models.CASCADE)
name = models.CharField(max_length=50)
price = models.DecimalField(decimal_places=2, max_digits=5)
def __str__(self):
return str(self.category) + " " + self.name + " " + str(self.price)
And, there are Category
and Product
admins as shown below:
# "app/admin.py"
from django.contrib import admin
from .models import Category, Product
@admin.register(Category)
class CategoryAdmin(admin.ModelAdmin):
list_display = ('id', 'name')
ordering = ('id',)
@admin.register(Product)
class ProductAdmin(admin.ModelAdmin):
list_display = ('id', 'name', 'price', 'category')
ordering = ('id',)
Then, Category
admin has 4 objects as shown below:
And, Product
admin has 6 objects as shown below:
<Forward Foreign Key>
Then, I iterate Category
model from Product
model as shown below:
for obj in Product.objects.all():
print(obj.category)
Then, these below are outputted on console:
Fruits
Fruits
Vegetable
Meat
Meat
Fish
Then, 7 SELECT
queries are run as shown below:
Next, I iterate Category
model from Product
model with select_related("category")
as shown below:
for obj in Product.objects.select_related("category").all():
print(obj.category)
Then, these below are outputted on console:
Fruits
Fruits
Vegetable
Meat
Meat
Fish
Then, 1 SELECT
query is run as shown below:
Next, I iterate Category
model from Product
model with prefetch_related("category")
as shown below:
for obj in Product.objects.prefetch_related("category").all():
print(obj.category)
Then, these below are outputted on console:
Fruits
Fruits
Vegetable
Meat
Meat
Fish
Then, 2 SELECT
queries are run as shown below:
<Reverse Foreign Key>
Next, I iterate Product
model from Category
model as shown below:
for obj in Category.objects.all():
print(obj.product_set.all())
Then, these below are outputted on console:
<QuerySet [<Product: Fruits Apple 10.00>, <Product: Fruits Orange 20.00>]>
<QuerySet [<Product: Vegetable Carrot 30.00>]>
<QuerySet [<Product: Meat Chicken 40.00>, <Product: Meat Beef 50.00>]>
<QuerySet [<Product: Fish Salmon 60.00>]>
Then, 5 SELECT
queries are run as shown below:
Next, I tries to iterate Product
model from Category
model with select_related("product_set")
as shown below:
for obj in Category.objects.select_related("product_set").all():
print(obj.product_set.all())
Then, the error below occurs because select_related("product_set")
cannot be used with Reverse Foreign Key:
django.core.exceptions.FieldError: Invalid field name(s) given in
select_related: ‘product_set’. Choices are: (none)
Actually, there is no error if I use select_related()
with no argument as shown below:
# ↓ No argument
for obj in Category.objects.select_related().all():
print(obj.product_set.all())
Then, these below are outputted on console:
<QuerySet [<Product: Fruits Apple 10.00>, <Product: Fruits Orange 20.00>]>
<QuerySet [<Product: Vegetable Carrot 30.00>]>
<QuerySet [<Product: Meat Chicken 40.00>, <Product: Meat Beef 50.00>]>
<QuerySet [<Product: Fish Salmon 60.00>]>
But, 5 SELECT
queries are still run instead of 1 SELECT
query as shown below:
Next, I iterate Product
model from Category
model with prefetch_related("product_set")
as shown below:
for obj in Category.objects.prefetch_related("product_set").all():
print(obj.product_set.all())
Then, these below are outputted on console:
<QuerySet [<Product: Fruits Apple 10.00>, <Product: Fruits Orange 20.00>]>
<QuerySet [<Product: Vegetable Carrot 30.00>]>
<QuerySet [<Product: Meat Chicken 40.00>, <Product: Meat Beef 50.00>]>
<QuerySet [<Product: Fish Salmon 60.00>]>
Then, 2 SELECT
queries are run as shown below:
<Many-to-many relationship>
For example, Category
and Product
models have many-to-many relationship as shown below:
# "app/models.py"
from django.db import models
class Category(models.Model):
name = models.CharField(max_length=20)
def __str__(self):
return self.name
class Product(models.Model):
categories = models.ManyToManyField(Category)
name = models.CharField(max_length=50)
price = models.DecimalField(decimal_places=2, max_digits=5)
def __str__(self):
return self.name + " " + str(self.price)
And, there are Category
and Product
admins as shown below:
# "app/admin.py
from django.contrib import admin
from .models import Category, Product
@admin.register(Category)
class CategoryAdmin(admin.ModelAdmin):
list_display = ('id', 'name', 'get_products')
ordering = ('id',)
@admin.display(description='products')
def get_products(self, obj):
return [product.name for product in obj.product_set.all()]
@admin.register(Product)
class ProductAdmin(admin.ModelAdmin):
list_display = ('id', 'name', 'price', 'get_categories')
ordering = ('id',)
@admin.display(description='categories')
def get_categories(self, obj):
return [category.name for category in obj.categories.all()]
Then, Category
admin has 5 objects as shown below:
And, Product
admin has 6 objects as shown below:
<Forward Foreign Key>
Then, I iterate Category
model from Product
model as shown below:
for obj in Product.objects.all():
print(obj.categories.all())
Then, these below are outputted on console:
<QuerySet [<Category: Fruits>, <Category: 20% OFF>]>
<QuerySet [<Category: Fruits>]>
<QuerySet [<Category: Vegetable>]>
<QuerySet [<Category: Meat>, <Category: 20% OFF>]>
<QuerySet [<Category: Meat>]>
<QuerySet [<Category: Fish>, <Category: 20% OFF>]>
Then, 7 SELECT
queries are run as shown below:
Next, I iterate Category
model from Product
model with select_related("categories")
as shown below:
for obj in Product.objects.select_related("categories").all():
print(obj.categories.all())
Then, the error below occurs because select_related("categories")
cannot be used with Reverse Foreign Key:
django.core.exceptions.FieldError: Invalid field name(s) given in
select_related: ‘categories’. Choices are: (none)
Actually, there is no error if I use select_related()
with no argument as shown below:
# ↓ No argument
for obj in Product.objects.select_related().all():
print(obj.categories.all())
Then, these below are outputted on console:
<QuerySet [<Category: Fruits>, <Category: 20% OFF>]>
<QuerySet [<Category: Fruits>]>
<QuerySet [<Category: Vegetable>]>
<QuerySet [<Category: Meat>, <Category: 20% OFF>]>
<QuerySet [<Category: Meat>]>
<QuerySet [<Category: Fish>, <Category: 20% OFF>]>
But, 7 SELECT
queries are still run instead of 1 SELECT
query as shown below:
Next, I iterate Category
model from Product
model with prefetch_related("categories")
as shown below:
for obj in Product.objects.prefetch_related("categories").all():
print(obj.categories.all())
Then, these below are outputted on console:
<QuerySet [<Category: Fruits>, <Category: 20% OFF>]>
<QuerySet [<Category: Fruits>]>
<QuerySet [<Category: Vegetable>]>
<QuerySet [<Category: Meat>, <Category: 20% OFF>]>
<QuerySet [<Category: Meat>]>
<QuerySet [<Category: Fish>, <Category: 20% OFF>]>
Then, 2 SELECT
queries are run as shown below:
<Reverse Foreign Key>
Next, I iterate Product
model from Category
model as shown below:
for obj in Category.objects.all():
print(obj.product_set.all())
Then, these below are outputted on console:
<QuerySet [<Product: Apple 10.00>, <Product: Orange 20.00>]>
<QuerySet [<Product: Carrot 30.00>]>
<QuerySet [<Product: Chicken 40.00>, <Product: Beef 50.00>]>
<QuerySet [<Product: Salmon 60.00>]>
<QuerySet [<Product: Apple 10.00>, <Product: Chicken 40.00>, <Product: Salmon 60.00>]>
Then, 6 SELECT
queries are run as shown below:
Next, I iterate Product
model from Category
model with select_related("product_set")
as shown below:
for obj in Category.objects.select_related("product_set").all():
print(obj.product_set.all())
Then, the error below occurs because select_related("categories")
cannot be used with Reverse Foreign Key:
django.core.exceptions.FieldError: Invalid field name(s) given in
select_related: ‘product_set’. Choices are: (none)
Actually, there is no error if I use select_related()
with no argument as shown below:
# ↓ No argument
for obj in Category.objects.select_related().all():
print(obj.product_set.all())
Then, these below are outputted on console:
<QuerySet [<Product: Apple 10.00>, <Product: Orange 20.00>]>
<QuerySet [<Product: Carrot 30.00>]>
<QuerySet [<Product: Chicken 40.00>, <Product: Beef 50.00>]>
<QuerySet [<Product: Salmon 60.00>]>
<QuerySet [<Product: Apple 10.00>, <Product: Chicken 40.00>, <Product: Salmon 60.00>]>
But, 6 SELECT
queries are still run instead of 1 SELECT
query as shown below:
Next, I iterate Product
model from Category
model with prefetch_related("product_set")
as shown below:
for obj in Category.objects.prefetch_related("product_set").all():
print(obj.product_set.all())
Then, these below are outputted on console:
<QuerySet [<Product: Apple 10.00>, <Product: Orange 20.00>]>
<QuerySet [<Product: Carrot 30.00>]>
<QuerySet [<Product: Chicken 40.00>, <Product: Beef 50.00>]>
<QuerySet [<Product: Salmon 60.00>]>
<QuerySet [<Product: Apple 10.00>, <Product: Chicken 40.00>, <Product: Salmon 60.00>]>
Then, 2 SELECT
queries are run as shown below:
<Additional experiments>
For example, there are Country
, State
and City
models which have one-to-many relationship as shown below:
# "app/models.py"
from django.db import models
class Country(models.Model):
name = models.CharField(max_length=20)
def __str__(self):
return self.name
class State(models.Model):
country = models.ForeignKey(Country, on_delete=models.CASCADE)
name = models.CharField(max_length=20)
def __str__(self):
return self.name
class City(models.Model):
state = models.ForeignKey(State, on_delete=models.CASCADE)
name = models.CharField(max_length=20)
def __str__(self):
return self.name
And, there are Country
, State
and City
admins as shown below:
# "app/admin.py
from django.contrib import admin
from .models import Country, State, City
@admin.register(Country)
class CountryAdmin(admin.ModelAdmin):
list_display = ('id', 'name')
ordering = ('id',)
@admin.register(State)
class StateAdmin(admin.ModelAdmin):
list_display = ('id', 'name', 'get_country')
ordering = ('id',)
@admin.display(description='country')
def get_country(self, obj):
return obj.country
@admin.register(City)
class CityAdmin(admin.ModelAdmin):
list_display = ('id', 'name', 'get_state', 'get_country')
ordering = ('id',)
@admin.display(description='state')
def get_state(self, obj):
print(obj)
return obj.state
@admin.display(description='country')
def get_country(self, obj):
return obj.state.country
Then, Country
admin has 2 objects as shown below:
And, State
admin has 3 objects as shown below:
Then, City
admin has 6 objects as shown below:
<Forward Foreign Key>
Then, I iterate Country
model from City
model with select_related("state__country")
as shown below:
for obj in City.objects.all().select_related("state__country"):
print(obj.state.country)
Then, these below are outputted on console:
USA
USA
USA
USA
Japan
Japan
Then, 1 SELECT
query is run as shown below:
Next, I iterate Country
model from City
model with prefetch_related("state__country")
as shown below:
for obj in City.objects.all().prefetch_related("state__country"):
print(obj.state.country)
Then, these below are outputted on console:
USA
USA
USA
USA
Japan
Japan
Then, 3 SELECT
queries are run as shown below:
<Reverse Foreign Key>
Next, I iterate City
model from Country
model with prefetch_related("state_set__city_set")
as shown below:
for country_obj in Country.objects.all().prefetch_related("state_set__city_set"):
for state_obj in country_obj.state_set.all():
for city_obj in state_obj.city_set.all():
print(city_obj)
Then, these below are outputted on console:
San Francisco
Los Angeles
San Diego
Kansas City
Ginza
Akihabara
Then, 3 SELECT
queries are run as shown below:
Next, I iterate City
model from Country
model with prefetch_related("state_set__city_set")
with filter() instead of all() as shown below. *prefetch_related()
with filter()
doesn’t work:
# Here
for country_obj in Country.objects.filter().prefetch_related("state_set__city_set"):
for state_obj in country_obj.state_set.filter(): # Here
for city_obj in state_obj.city_set.filter(): # Here
print(city_obj)
Then, these below are outputted on console:
San Francisco
Los Angeles
San Diego
Kansas City
Ginza
Akihabara
Then, 8 SELECT
queries are run as shown below:
So to reduce 8 SELECT
queries, I need to use Prefetch() with filter()
but I only know how to iterate State
model from Country
model with Prefetch()
and filter()
as shown below. *I asked the question on Stack Overflow about how to iterate City
model from Country
model with Prefetch()
and filter()
:
for country_obj in Country.objects.filter().prefetch_related(
Prefetch('state_set', # Here
queryset=State.objects.filter(),
to_attr='state_obj'
)
):
print(country_obj.state_obj)
Then, these below are outputted on console:
[<State: California>, <State: Missouri>]
[<State: Tokyo>]
Then, 2 SELECT
queries are run as shown below:
- [Django]-Warning: Auto-created primary key used when not defining a primary key type, by default 'django.db.models.AutoField'
- [Django]-Switching to PostgreSQL fails loading datadump
- [Django]-Using django-rest-interface
4👍
Don’t be confused
**select_related:** Use for ForeignKey relationship,
**prefetch_related:** Use for ManyToManyField relationship or reverse ForeignKey.
They doing the same thing reducing the number of query
For example:
class ExampleClassA(models.Model):
title = models.CharField(max_length=50)
class ExampleClassB(models.Model):
example_class_a = models.ForeignKey(ExampleClassA,
on_delete=models.CASCADE)
objects = ExampleClassB.objects.all()
for obj in objects:
print(obj.example_class_a.title)
Number of query(access related field): N+1 (# n is number of ExampleClassA’s object)
if we use this query:
objects = ExampleClassB.objects.select_related('example_class_a').all()
Number of query is only one.
- [Django]-Django: why i can't get the tracebacks (in case of error) when i run LiveServerTestCase tests?
- [Django]-Name '_' is not defined
- [Django]-How can I create a deep clone of a DB object in Django?
4👍
Let me try to show you how Django is going to make db calls in select_related and prefetch_related
class a(models.Model):
name = models.CharField(max_length=100)
class b(models.Model):
name = models.CharField(max_length=100)
a = models.ForeignKey(A, on_delete=models.CASCADE)
#select_related query->
b.objects.select_related('a').first()
SQL query executed for this would be
SELECT * FROM "b" LEFT OUTER JOIN "a" ON ("b"."a_id" = "a"."id") LIMIT 1
Here Django will get "a" model details using JOIN
#prefetch_related query->
B.objects.prefetch_related('a').first()
SQL query executed for this would be
SELECT * FROM "b" LIMIT 1
SELECT * FROM "a" WHERE "a"."id" IN (ids collected from above query)
Here Django will execute two SQL query and merge them through python
- [Django]-How to access request body when using Django Rest Framework and avoid getting RawPostDataException
- [Django]-How to use pdb.set_trace() in a Django unittest?
- [Django]-No handlers could be found for logger