[Django]-Django: Retrieving IDs of manyToMany fields quickly

25👍

According you have Three instances. You can use the values_list method to retrieve just the results and from this result get just the ID’s of their related instances.
I use the pk field to be my filter because i don’t know your scheme, but you can use anything, just must be a QuerySet.

>>> result = A.objects.filter(pk=1)
>>> result.values('related__id')
[{'id': 2}, {'id': 3}]
>>> result.values_list('related__id')
[(2,), (3,)]
>>> result.values_list('related__id', flat=True)
[2, 3]

1👍

You can get pretty close like this:

qs = A.objects.prefetch_related(Prefetch(
                      'related', 
                      queryset=A.objects.only('pk'), 
                      to_attr='related_insts')).in_bulk(my_list_of_pks)

This will give a mapping from pks of the current object to the instance itself, so you can iterate through as follows:

for pk, inst in qs.iteritems():
  related_ids = (related.pk for related in inst.related_insts)

Or given an instance, you can do a fast lookup like so:

related_ids = (related.pk for related in qs[instance.pk]).

This method maps the instance ids to the related ids (indirectly) since you specifically requested a dictionary. If you aren’t doing lookups, you may want the following instead:

qs = A.objects.prefetch_related(Prefetch(
        'related', 
        queryset=A.objects.only('pk'), 
        to_attr='related_insts')).filter(pk__in=my_list_of_pks)
for inst in qs:
  related_ids = (related.pk for related in inst.related_insts)

You may take note of the use of only to only pull the pks from the db. There is an open ticket to allow the use of values and (I presume) values_list in Prefetch queries. This would allow you to do the following.

qs = A.objects.prefetch_related(Prefetch(
        'related', 
        queryset=A.objects.values_list('pk', flat=True), 
        to_attr='related_ids')).filter(pk__in=my_list_of_pks)
for inst in qs:
  related_ids = inst.related_ids

You could of course optimize further, for example by using qs.only('related_insts') on the primary queryset, but make sure you aren’t doing anything with these instances– they’re essentially just expensive containers to hold your related_ids.

I believe this is the best that’s available for now (without custom queries). To get to exactly what you want, two things are needed:

  1. The feature above is implemented
  2. values_list is made to work with Prefetch to_attr like it does for annotations.

With these two things in place (and continuing the above example) you could do the following to get exactly what you requested:

d = qs.values_list('related_ids', flat=True).in_bulk()
for pk, related_pks in d.items():
    print 'Containing Objects %s' % pk
    print 'Related objects %s' % related_pks
# And lookups
print 'Object %d has related objects %s' % (20, d[20])

I’ve left off some details explaining things, but it should be pretty clear from the documentation. If you need any clarification, don’t hesitate!

0👍

If you’re using Postgres:

from django.contrib.postgres.aggregates import ArrayAgg

qs = A.objects.filter(pk__in=[1,2,6]).annotate(related_ids=ArrayAgg('related')).only('id')
mapping = {a.id: a.related_ids for a in qs}

You can also use filter/ordering in the ArrayAgg.

Leave a comment