[Django]-How to iterate a large table in Django without running out of memory?

5👍

The first thing to try is using the iterator() method on the queryset before iterating over it:

for ii in MyModel.objects.all().iterator():

1👍

The correct answer is to use Django’siterator() method on the queryset before iterating over it. However, you must also wrap your query in a transaction.

    with transaction.atomic():
        for ii in MyModel.objects.all().iterator():

This is because by default Django operates in "autocommit" mode, which means database cursors will have the WITH HOLD argument, causing common DB’s like postgres to use the large tempfile.

0👍

If you are using python3.X you can try some async tasks.

It may be useful to create a schedule fetch.

Something like this:

async def _fetch_all(self):
     if self._result_cache is None:
          self._result_cache = await list(self.iterator())  # <<<< this guy!
     if self._prefetch_related_lookups and not self._prefetch_done:
          await self._prefetch_related_objects()

To run your code:

import asyncio
my_model = MyModel()
asyncio.get_event_loop().run_until_complete(my_model._fetch_all())

But if you are using 2.7 you will need to create a async task with celery or try some tools to do that like Django Async.

Hope it helps.

Take a look

Leave a comment