6👍
The most basic and effective optimization in Django is to reduce the number of queries to the database. That’s true for 100 queries, and that’s most certainly true for 500.000 queries.
Instead of using MastTable.objects.create()
, you should construct a list of unsaved model instances, and use MastTable.objects.bulk_create(list_of_models)
to create them all in as few round-trips to the database as possible. This should speed it up tremendously.
If you’re using MySQL, you can increase the max_allowed_packet
setting to allow for larger batches. Its default of 1MB is quite low. PostGRESQL has no hardcoded limit. If you’re still running into performance issues, you can switch to raw SQL statements. Creating 500.000 python objects can be a bit of an overhead. In one of my recent tests, executing the exact same query with connection.cursor
was about 20% faster.
It can be a good idea to leave the actual processing of the file to a background process using e.g. Celery, or using a StreamingHttpResponse
to provide feedback during the process.
0👍
Does this csv file contains invalid rows? I mean do you really need this line?
except Exception as e: #This logs any exceptions to a custom DB table
If there is no such errors thrown then you should use bulk_create()
instead of just create()
.
Also I suggest to execute import in the single transaction. It is a HUGE speed booster:
from django.db import transaction
with transaction.atomic():
processor_table(extract_properties)