[Django]-How to find duplicate records based on certain fields in Django

It turns out it is not possible to perform conditional operations on the annotate function of a query set.

What I did was to override the get_context_data function, then get the duplicate keys. The returned object was a queryset so I took all the IDs and then put them in a list, then stored them to context which was made available in the template view.

This is what my get_context_data function looks like, if it could be further improved please let me know.

def get_context_data(self, **kwargs):
    ctx = super(fileList, self).get_context_data(**kwargs)

    qs = file.objects.annotate(
            dupe_id=Concat(
                        F('case_no')
                        , F('hearing_date')
                        , output_field=CharField()
            )
        )

    dupes = qs.values('dupe_id').annotate(dupe_count=Count('dupe_id')).filter(dupe_count__gt=1)

    dupe_keys = []
    for dupe in dupes:
        dupe_keys.append(dupe['dupe_id'])

    ctx['dupe_keys'] = dupe_keys

    return ctx

Now on the template view, on the for loop of the queryset, I just created another column which checks if the id in the queryset is visible in the list of duplicates, then the record will have a special tagging of duplicate or the cell will be highlighted to something visible to the user.

<td>{% if object.dupe_id in dupe_keys %} duplicate {% else %} not duplicate {% endif %}</td>

chip

[Django]-How to see exceptions raised from a channels consumer

One of the ways to work on duplicates is to use the algorithm of:
GroupBy in SQL > Find Duplicates > loop over duplicates

from django.db.models import Max, Count

# Getting duplicate files based on case_no and hearing_date
files = File.objects.values('case_no', 'hearing_date') \
    .annotate(records=Count('case_no')) \
    .filter(records__gt=1)

# Check the generated group by query
print files.query

# Then do operations on duplicates
for file in files:
    File.objects.filter(
        case_no=file['case_no'],
        hearing_date=file['hearing_date']
    )[1:].update(duplicate=True)

Pratyush

Source:stackexchange.com

Leave a comment Cancel reply