54π
The simplest way is the simplest way! Especially for one off scripts where performance doesnβt even matter (unless it does). Since itβs not core code, Iβd just write the first thing that comes to mind and works.
# assuming which duplicate is removed doesn't matter...
for row in MyModel.objects.all().reverse():
if MyModel.objects.filter(photo_id=row.photo_id).count() > 1:
row.delete()
Use .reverse()
to delete the duplicates first and keep the first instance of it, rather than the last.
As always, back up before you do this stuff.
25π
This may be faster because it avoids the inner filter for each row in MyModel.
Since the ids are unique, if the models are sorted by them in increasing order, we can keep track of the last id we saw and as we walk over the rows if we see a model with the same id, it must be a duplicate, so we can delete it.
lastSeenId = float('-Inf')
rows = MyModel.objects.all().order_by('photo_id')
for row in rows:
if row.photo_id == lastSeenId:
row.delete() # We've seen this id in a previous row
else: # New id found, save it and check future rows for duplicates.
lastSeenId = row.photo_id
- [Django]-Django: return string from view
- [Django]-Django- Get Foreign Key Model
- [Django]-Django-reversion and related model
4π
Here is a fast solution:
from django.db import connection
query = "SELECT id FROM table_name GROUP BY unique_column HAVING COUNT(unique_column)>1"
cursor = connection.cursor()
cursor.execute(query)
ids_list = [item[0] for item in cursor.fetchall()]
now you can do:
Some_Model.objects.filter(id__in=ids_list).delete()
or if ids_list
was too huge to be handled by your dbms
you can segment it to chunks that can be handled by it:
seg_length = 100
ids_lists = [ids_list[x:x+seg_length] for x in range(0,len(ids_list),seg_length)]
for ids_list in ids_lists:
SomeModel.objects.filter(id__in=ids_list).delete()
- [Django]-Suppress "?next=blah" behavior in django's login_required decorator
- [Django]-Python string to Django timezone (aware datetime)
- [Django]-Pypi see older versions of package
2π
I know itβs an old question, but if you really need performance and your table has +500k rows, this might help:
last_used_id = 0
your_ids = list(MyModel.objects
.order_by("your_id")
.values_list('your_id',flat=True)
.annotate(name_count=Count('your_id'))
.filter(name_count__gt=1)[:1000] # for example
)
duplicates = MyModel.objects
.filter(your_id__in=your_ids)
.order_by("your_id")
for my_model in duplicates:
if my_model.your_id == last_used_id:
my_model.delete()
else:
last_used_id = my_model.your_id
- [Django]-With DEBUG=False, how can I log django exceptions to a log file
- [Django]-Django: detect admin login in view or template
- [Django]-Postgresql DROP TABLE doesn't work
1π
from Django 1.11, you can use
MyModel.objects.annotate(
count=Subquery(
MyModel.objects.filter(
photo_id=OuterRef('photo_id')
).values(
'photo_id'
).annotate(
count=Count('pk')
).values('count')
)
).filter(
count__gt=1
)
This query will give you the rows that donβt have a unique photo_id, you can then keep one row of each photo id and delete the rest
- [Django]-How to add annotate data in django-rest-framework queryset responses?
- [Django]-How can I access the form submit button value in Django?
- [Django]-Access request in django custom template tags
0π
Instead of iterating the whole table you can just do
count = MyModel.objects.filter(photo_id='some_photo_id').count()
while count >=1:
MyModel.objects.filter(photo_id='some_photo_id')[0].delete()
count -= 1
- [Django]-Django- nginx: [emerg] open() "/etc/nginx/proxy_params" failed (2: No such file or directory) in /etc/nginx/sites-enabled/myproject:11
- [Django]-How to use Django ImageField, and why use it at all?
- [Django]-Django dynamically filtering with q objects
0π
A general and optimized method in case large number of objects need to be deleted β
qs = Model.objects.all()
key_set = set()
delete_ids_list = []
for object in qs:
object_key = object.unique_key # photo_id here
if object_key in key_set:
delete_ids_list.append(object.id)
else:
key_set.add(object_key)
Model.objects.filter(id__in=delete_ids_list).delete()
- [Django]-Django β Website Home Page
- [Django]-Django forms, inheritance and order of form fields
- [Django]-What is the function of the name parameter in django.urls.path?
0π
Combining some of the other answers along with a Window function allows you to annotate the row number. Itβs important to use a partition otherwise the row numbers will just be sequential. Iterate through the queryset and add any object where the row number is not the first occurrence to a list of ids that can be later deleted in one query.
from django.db.models import Count, F, IntegerField, OuterRef, Subquery, Window
from django.db.models.functions import RowNumber
from group.models import BuyingGroupTemplate
templates = (
BuyingGroupTemplate.objects
.order_by('group', 'product')
.annotate(
count=Subquery(
BuyingGroupTemplate.objects
.filter(
group=OuterRef('group'),
product=OuterRef('product'),
)
.values('group', 'product')
.annotate(count=Count('*'))
.values('count'),
output_field=IntegerField(),
),
row_number=Window(
expression=RowNumber(),
partition_by=[F('group'), F('product')],
order_by=F('id').asc()
),
)
.filter(count__gt=1)
)
ids_to_delete = list()
for template in templates:
if template.row_number != 1:
ids_to_delete.append(template.id)
BuyingGroupTemplate.objects.filter(id__in=ids_to_delete).delete()
- [Django]-Django: "TypeError: [] is not JSON serializable" Why?
- [Django]-How should I use DurationField in my model?
- [Django]-How do I add a Foreign Key Field to a ModelForm in Django?
0π
If you have lots of records, efficient way:
import itertools
pks_to_delete = []
rows = MyModel.objects.values_list('pk', 'photo_id').order_by('photo_id')
filter_func = lambda x: x[1]
for key, group in itertools.groupby(rows.iterator(), filter_func):
pks_to_delete.extend((i[0] for i in list(group)[1:]))
MyModel.objects.filter(pk__in=pks_to_delete).delete()
- [Django]-Django β Getting last object created, simultaneous filters
- [Django]-Install mod_wsgi on Ubuntu with Python 3.6, Apache 2.4, and Django 1.11
- [Django]-Charts in django Web Applications