160π
This query will not give you duplicates β ie, it will give you all the rows in the database, ordered by email.
However, I presume what you mean is that you have duplicate data within your database. Adding distinct()
here wonβt help, because even if you have only one field, you also have an automatic id
field β so the combination of id+email is not unique.
Assuming you only need one field, email_address
, de-duplicated, you can do this:
email_list = Email.objects.values_list('email', flat=True).distinct()
However, you should really fix the root problem, and remove the duplicate data from your database.
Example, deleting duplicate Emails by email field:
for email in Email.objects.values_list('email', flat=True).distinct():
Email.objects.filter(pk__in=Email.objects.filter(email=email).values_list('id', flat=True)[1:]).delete()
Or books by name:
for name in Book.objects.values_list('name', flat=True).distinct():
Book.objects.filter(pk__in=Artwork.objects.filter(name=name).values_list('id', flat=True)[3:]).delete()
19π
For checking duplicate you can do a GROUP_BY
and HAVING
in Django
as below. We are using Django annotations
here.
from django.db.models import Count
from app.models import Email
duplicate_emails = Email.objects.values('email').annotate(email_count=Count('email')).filter(email_count__gt=1)
Now looping through the above data and deleting all other emails
except the first one (depends on requirement or whatever).
for data in duplicates_emails:
email = data['email']
Email.objects.filter(email=email).order_by('pk')[1:].delete()
- [Django]-What is the best way to upload files in a modern browser
- [Django]-Changing a project name in django
- [Django]-How to get form fields' id in Django
14π
You can chain .distinct()
on the end of your queryset to filter duplicates. Check out: http://docs.djangoproject.com/en/dev/ref/models/querysets/#django.db.models.query.QuerySet.distinct
- [Django]-HTML β How to do a Confirmation popup to a Submit button and then send the request?
- [Django]-Web application monitoring best practices
- [Django]-Which Model Field to use in Django to store longitude and latitude values?
8π
You may be able to use the distinct()
function, depending on your model. If you only want to retrieve a single field form the model, you could do something like:
email_list = Emails.objects.values_list('email').order_by('email').distinct()
which should give you an ordered list of emails.
- [Django]-How do I filter query objects by date range in Django?
- [Django]-Check if key exists in a Python dict in Jinja2 templates
- [Django]-Choose test database?
3π
You can also use set()
email_list = set(Emails.objects.values_list('email', flat=True))
- [Django]-How to manually assign imagefield in Django
- [Django]-Redirect to named url pattern directly from urls.py in django?
- [Django]-Django β FileField check if None
1π
Use, self queryset.annotate()
!
from django.db.models import Subquery, OuterRef
email_list = Emails.objects.filter(
pk__in = Emails.objects.values('emails').distinct().annotate(
pk = Subquery(
Emails.objects.filter(
emails= OuterRef("emails")
)
.order_by("pk")
.values("pk")[:1])
)
.values_list("pk", flat=True)
)
This queryset goes to make this query.
SELECT `email`.`id`,
`email`.`title`,
`email`.`body`,
...
...
FROM `email`
WHERE `email`.`id` IN (
SELECT DISTINCT (
SELECT U0.`id`
FROM `email` U0
WHERE U0.`email` = V0.`approval_status`
ORDER BY U0.`id` ASC
LIMIT 1
) AS `pk`
FROM `agent` V0
)
cheet-sheet
from django.db.models import Subquery, OuterRef
group_by_duplicate_col_queryset = Models.objects.filter(
pk__in = Models.objects.values('duplicate_col').distinct().annotate(
pk = Subquery(
Models.objects.filter(
duplicate_col= OuterRef('duplicate_col')
)
.order_by("pk")
.values("pk")[:1])
)
.values_list("pk", flat=True)
)
- [Django]-Inline Form Validation in Django
- [Django]-Adding to the "constructor" of a django model
- [Django]-Using django-rest-interface
0π
I used the following to actually remove the duplicate entries from from the database, hopefully this helps someone else.
adds = Address.objects.all()
d = adds.distinct('latitude', 'longitude')
for address in adds:
if i not in d:
address.delete()
- [Django]-Django Cache cache.set Not storing data
- [Django]-Celery. Decrease number of processes
- [Django]-Django: Query using contains each value in a list
0π
if you want remove duplicacy from the queryset, for eg. letβs say you have the user model with fields like name, email and you want remove duplicate emails then, you can simply use distinct()
method
User.objects.all().distinct("email")
it will return all the unique emails.
- [Django]-Optimal architecture for multitenant application on django
- [Django]-How can I resolve 'django_content_type already exists'?
- [Django]-Can you perform multi-threaded tasks within Django?
-2π
you can use this raw query : your_model.objects.raw("select * from appname_Your_model group by column_name")
- [Django]-In a Django form, how do I make a field readonly (or disabled) so that it cannot be edited?
- [Django]-Django: return string from view
- [Django]-How do I package a python application to make it pip-installable?