1👍
Let’s start off with the basics
A serializer can only work with the data it is given
So this means that in order to get a serializer which can serialize a list of ItemGroup
and Item
objects in a nested representation, it has to be given that list in the first place. You’ve accomplished that so far using a query on the ItemGroup
model that calls prefetch_related
to get the related Item
objects. You’ve also identified that prefetch_related
triggers a second query to get those related objects, and this isn’t satisfactory.
prefetch_related
is used to get multiple related objects
What does this mean exactly? When you are querying for a single object, like a single ItemGroup
, you use prefetch_related
to get a relationship containing multiple related objects, like a reverse foreign key (one-to-many) or a many-to-many relationship that’s been defined. Django intentionally uses a second query to get these objects for a few reasons
- The join that would be required in a
select_related
is often non-performant when you force it to do a join against a second table. This is because a right outer join would be required in order to ensure that noItemGroup
objects that do not contain anItem
are missed. - The query used by
prefetch_related
is anIN
on an indexed primary key field, which is one of the most performant queries out there. - The query only requests the IDs of
Item
objects it knows exist, so it can efficiently handle duplicates (in the case of many-to-many relationships) without having to do an additional subquery.
All of this is a way to say: prefetch_related
is doing exactly what it should do, and it’s doing that for a reason.
But I want to do this with a select_related
anyway
Alright, alright. That’s what was asked for, so let’s see what can be done.
There are a few ways to accomplish this, all of which have their pros and cons and none of which work without some manual "stitching" work in the end. I am making the assumption that you aren’t using the built-in ViewSet or generic views provided by DRF, but if you are then the stitching must happen in the filter_queryset
method to allow the built-in filtering to work. Oh, and it probably breaks pagination or makes it almost useless.
Preserving the original filters
The original set of filters are being applied to the ItemGroup
object. And since this is being used in an API, these are probably dynamic and you don’t want to lose them. So, you are going to need to apply filters through one of two ways:
-
Generate the filters and then prefix them with the related name
So you would generate your normal
foo=bar
filters and then prefix them before passing it tofilter()
so it’d berelated__foo=bar
. This may have some performance implications since you’re now filtering across relationships. -
Generate the original subquery and then pass it to the
Item
query directlyThis is probably the "cleanest" solution, except you’re generating an
IN
query with comparable performance to theprefetch_related
one. Except it’s worse performance, since this is treated as an uncacheable subquery instead.
Implementing both of these are realistically out of the scope of this question, since we want to be able to "flip and stitch" the Item
and ItemGroup
objects so the serializer works.
Flipping the Item
query so you get a list of ItemGroup
objects
Taking the query given in the original question, where select_related
is being used to grab all of the ItemGroup
objects alongside the Item
objects, you are returned a queryset full of Item
objects. We actually want a list of ItemGroup
objects, since we’re working with an ItemGroupSerializer
, so we’re going to have to "flip it" around.
from collections import defaultdict
items = Item.objects.filter(**filters).select_related('item_group')
item_groups_to_items = defaultdict(list)
item_groups_by_id = {}
for item in items:
item_group = item.item_group
item_groups_by_id[item_group.id] = item_group
item_group_to_items[item_group.id].append(item)
I am intentionally using the id
of the ItemGroup
as the key for the dictionaries since most Django models are not immutable, and sometimes people override the hashing method to be something other than the primary key.
This will get you a mapping of ItemGroup
objects to their related Item
objects, which is ultimately what you need in order to "stitch" them together again.
Stitching the ItemGroup
objects back with their related Item
objects
This part isn’t actually difficult to do, since you have all of the related objects already.
for item_group_id, item_group_items in item_group_to_items.items():
item_group = item_groups_by_id[item_group_id]
item_group.item_set = item_group_items
item_groups = item_groups_by_id.values()
This will get you all of the ItemGroup
objects that were requested and have them stored as list
in the item_groups
variable. Each ItemGroup
object will have the list of related Item
objects set in the item_set
attribute. You may want to rename this so it doesn’t conflict with the automatically generated reverse foreign key of the same name.
From here, you can use it as you normally would in your ItemGroupSerializer
and it should work for serialization.
Bonus: A generic way to "flip and stitch"
You can make this generic (and unreadable) pretty quickly, for use in other similar scenarios:
def flip_and_stitch(itmes, group_from_item, store_in):
from collections import defaultdict
item_groups_to_items = defaultdict(list)
item_groups_by_id = {}
for item in items:
item_group = getattr(item, group_from_item)
item_groups_by_id[item_group.id] = item_group
item_group_to_items[item_group.id].append(item)
for item_group_id, item_group_items in item_group_to_items.items():
item_group = item_groups_by_id[item_group_id]
setattr(item_group, store_in, item_group_items)
return item_groups_by_id.values()
And you’d just call this as
item_groups = flip_and_stitch(items, 'item_group', 'item_set')
Where:
items
is the queryset of items that you requested originally, with theselect_related
call already applied.item_group
is the attribute on theItem
object where the relatedItemGroup
is stored.item_set
is the attribute on theItemGroup
object where the list of relatedItem
objects should be stored.
8👍
Using prefetch_related
you will have two queries + the big IN clauses issue, although it is proven and portable.
I would give a solution that is more an example, based on your field names. It will create a function that transform from a serializer for Item
using your select_related
queryset
. It will override the list function of the view and transform from one serializer data to the other one that will give you the representation you want. It will use only one query and parsing the results will be in O(n)
so it should be fast.
You might need to refactor get_data
in order to add more fields to your results.
class ItemSerializer(serializers.ModelSerializer):
group_name = serializers.CharField(source='item_group.group_name')
class Meta:
model = Item
fields = ('item_name', 'group_name')
class ItemGSerializer(serializers.Serializer):
group_name = serializers.CharField(max_length=50)
items = serializers.ListField(child=serializers.CharField(max_length=50))
In the view:
class ItemGroupViewSet(viewsets.ModelViewSet):
model = models.Item
serializer_class = serializers.ItemSerializer
queryset = models.Item.objects.select_related('item_group').all()
def list(self, request, *args, **kwargs):
queryset = self.filter_queryset(self.get_queryset())
page = self.paginate_queryset(queryset)
if page is not None:
serializer = self.get_serializer(page, many=True)
data = self.get_data(serializer.data)
s = serializers.ItemGSerializer(data, many=True)
return self.get_paginated_response(s.data)
serializer = self.get_serializer(queryset, many=True)
data = self.get_data(serializer.data)
s = serializers.ItemGSerializer(data, many=True)
return Response(s.data)
@staticmethod
def get_data(data):
result, current_group = [], None
for elem in data:
if current_group is None:
current_group = {'group_name': elem['group_name'], 'items': [elem['item_name']]}
else:
if elem['group_name'] == current_group['group_name']:
current_group['items'].append(elem['item_name'])
else:
result.append(current_group)
current_group = {'group_name': elem['group_name'], 'items': [elem['item_name']]}
if current_group is not None:
result.append(current_group)
return result
Here is my result with my fake data:
[{
"group_name": "group #2",
"items": [
"first item",
"2 item",
"3 item"
]
},
{
"group_name": "group #1",
"items": [
"g1 #1",
"g1 #2",
"g1 #3"
]
}]
- Multiple USERNAME_FIELD in django user model
- Django REST Framework Swagger 2.0
- Django/Python Runtime Error: Maximum recursion depth exceeded