[Answered ]-Celery/django duplicate key violations

2πŸ‘

βœ…

If you are creating these records in bulk and multiple threads are at it, it’s indeed very likely that IntegrityErrors are caused by different threads inserting the same data. Do you really need multiple threads working on this? If yes you could try:

create_domains = []
create_domain_ids = []

for x in dup_free_scrape:
    domain = x['domain']
    new_domain, created = Domain.objects.get_or_create(name = domain.lower()
    if created:
        create_domains.append(domain.lower())
        created_domain_ids.append(new_domain.pk)

Note that this is all the code. The select all query which you had right at the start is not needed. Domain.objects.all() is going to be very inefficient because you are reading the entire table there.

Also note that your list comprehension for x['domain'] appeared to be completely redundant.

create_domains and create_domain_ids lists may not be needed unless you want to keep track of what was being created.

Please make sure that you have the proper index on domain name. From get_or_create docs:

This method is atomic assuming correct usage, correct database
configuration, and correct behavior of the underlying database.
However, if uniqueness is not enforced at the database level for the
kwargs used in a get_or_create call (see unique or unique_together),
this method is prone to a race-condition which can result in multiple
rows with the same parameters being inserted simultaneously.

πŸ‘€e4c5

Leave a comment