198👍
Using order_by('?')
will kill the db server on the second day in production. A better way is something like what is described in Getting a random row from a relational database.
from django.db.models.aggregates import Count
from random import randint
class PaintingManager(models.Manager):
def random(self):
count = self.aggregate(count=Count('id'))['count']
random_index = randint(0, count - 1)
return self.all()[random_index]
- [Django]-Why won't Django use IPython?
- [Django]-Homepage login form Django
- [Django]-Django – "Incorrect type. Expected pk value, received str" error
25👍
The solutions with order_by(‘?’)[:N] are extremely slow even for medium-sized tables if you use MySQL (don’t know about other databases).
order_by('?')[:N]
will be translated to SELECT ... FROM ... WHERE ... ORDER BY RAND() LIMIT N
query.
It means that for every row in table the RAND() function will be executed, then the whole table will be sorted according to value of this function and then first N records will be returned. If your tables are small, this is fine. But in most cases this is a very slow query.
I wrote simple function that works even if id’s have holes (some rows where deleted):
def get_random_item(model, max_id=None):
if max_id is None:
max_id = model.objects.aggregate(Max('id')).values()[0]
min_id = math.ceil(max_id*random.random())
return model.objects.filter(id__gte=min_id)[0]
It is faster than order_by(‘?’) in almost all cases.
- [Django]-The QuerySet value for an exact lookup must be limited to one result using slicing. Filter error
- [Django]-How do I remove Label text in Django generated form?
- [Django]-Sending HTML email in django
25👍
Here’s a simple solution:
from random import randint
count = Model.objects.count()
random_object = Model.objects.all()[randint(0, count - 1)] #single random object
- [Django]-How to change empty_label for modelForm choice field?
- [Django]-Django models.py Circular Foreign Key
- [Django]-Django-taggit – how do I display the tags related to each record
11👍
You could create a manager on your model to do this sort of thing. To first understand what a manager is, the Painting.objects
method is a manager that contains all()
, filter()
, get()
, etc. Creating your own manager allows you to pre-filter results and have all these same methods, as well as your own custom methods, work on the results.
EDIT: I modified my code to reflect the order_by['?']
method. Note that the manager returns an unlimited number of random models. Because of this I’ve included a bit of usage code to show how to get just a single model.
from django.db import models
class RandomManager(models.Manager):
def get_query_set(self):
return super(RandomManager, self).get_query_set().order_by('?')
class Painting(models.Model):
title = models.CharField(max_length=100)
author = models.CharField(max_length=50)
objects = models.Manager() # The default manager.
randoms = RandomManager() # The random-specific manager.
Usage
random_painting = Painting.randoms.all()[0]
Lastly, you can have many managers on your models, so feel free to create a LeastViewsManager()
or MostPopularManager()
.
- [Django]-New url format in Django 1.9
- [Django]-Django models: mutual references between two classes and impossibility to use forward declaration in python
- [Django]-Django 2, python 3.4 cannot decode urlsafe_base64_decode(uidb64)
7👍
The other answers are either potentially slow (using order_by('?')
) or use more than one SQL query. Here’s a sample solution with no ordering and just one query (assuming Postgres):
random_instance_or_none = Model.objects.raw('''
select * from {0} limit 1
offset floor(random() * (select count(*) from {0}))
'''.format(Model._meta.db_table)).first()
Be aware that this will raise an index error if the table is empty. Write yourself a model-agnostic helper function to check for that.
- [Django]-Get list item dynamically in django templates
- [Django]-Django AutoField with primary_key vs default pk
- [Django]-How do you detect a new instance of the model in Django's model.save()
4👍
Just a simple idea how I do it:
def _get_random_service(self, professional):
services = Service.objects.filter(professional=professional)
i = randint(0, services.count()-1)
return services[i]
- [Django]-Django 1.5b1: executing django-admin.py causes "No module named settings" error
- [Django]-What is the Simplest Possible Payment Gateway to Implement? (using Django)
- [Django]-How can I keep test data after Django tests complete?
3👍
Hi I needed to select a random record from a queryset who’s length I also needed to report (ie web page produced described item and said records left)
q = Entity.objects.filter(attribute_value='this or that')
item_count = q.count()
random_item = q[random.randomint(1,item_count+1)]
took half as long(0.7s vs 1.7s) as:
item_count = q.count()
random_item = random.choice(q)
I’m guessing it avoids pulling down the whole query before selecting the random entry and made my system responsive enough for a page that is accessed repeatedly for a repetitive task where users want to see the item_count count down.
- [Django]-Negating a boolean in Django template
- [Django]-Pylint "unresolved import" error in Visual Studio Code
- [Django]-Django celery task: Newly created model DoesNotExist
3👍
Randomization in DB feels nasty and better in python. But at the same time, it’s not a good idea to bring all the data from DB to python memory just to ignore most of the results (especially in the production environment). we might need some sort of filtering also.
- So Basically we have data at DB,
- we wanna use the rand function of python
- and afterwords bring up the whole required data from DB.
Basically using 2 queries will be much less expensive than picking random in DB CPU (computing in DB) or loading whole data (heavy Network Utilization). Solutions explained must need a scalable nature trying to plan here won’t work for a production environment espicially with filters, soft/hard deletes, or even with an is_public flag. because probably random id we generated might be deleted from the database or will be cut down in filters. Its a bad practice to assume max_id(records) == count(records).
(Ofcouce, If you do’not delete a percentage of data which is comparable to query uses, or if you dont wanna use any kond of filters, and if you are confident, random id which you can proceed with a random )
if you want only one items.
Refer ( @Valter Silva )
import random
mgr = models.Painting.objects
qs = mgr.filter(...)
random_id = random.choice(1, qs.count())-1 # <--- [ First Query Hit ]
random_paint = qs[random_id] ## <-- [ Second Query Hit ]
if you want ‘n’ items.
import random
req_no_of_random_items = 8 ## i need 8 random items.
qs = models.Painting.objects.filter(...)
## if u prefer to use random values often, you can keep this in cache.
possible_ids = list(qs.values_list('id', flat=True)) # <--- [ First Query Hit ]
possible_ids = random.choices(possible_ids, k=8)
random_paint = qs.filter(pk__in=possible_ids) ## in a generic case to get 'n' items.
or if you want to have a more optimized code for production, use a cachefunction to get ids of products:
from django.core.cache import cache
def id_set_cache(qs):
key = "some_random_key_for_cache"
id_set = cache.get(key)
if id_set is None:
id_set = list(qs.values_list('id', flat=True)
cache.set(key, id_set)
retrun id_set
- [Django]-Django – How to rename a model field using South?
- [Django]-Django – Render the <label> of a single form field
- [Django]-Class has no objects member
2👍
Just to note a (fairly common) special case, if there is a indexed auto-increment column in the table with no deletes, the optimum way to do a random select is a query like:
SELECT * FROM table WHERE id = RAND() LIMIT 1
that assumes such a column named id for table. In django you can do this by:
Painting.objects.raw('SELECT * FROM appname_painting WHERE id = RAND() LIMIT 1')
in which you must replace appname with your application name.
In General, with an id column, the order_by(‘?’) can be done much faster with:
Paiting.objects.raw(
'SELECT * FROM auth_user WHERE id>=RAND() * (SELECT MAX(id) FROM auth_user) LIMIT %d'
% needed_count)
- [Django]-Timestamp fields in django
- [Django]-With DEBUG=False, how can I log django exceptions to a log file
- [Django]-Django Cannot set values on a ManyToManyField which specifies an intermediary model. Use Manager instead
2👍
Method for auto-incrementing primary key with no deletes
If you have a table where the primary key is a sequential integer with no gaps, then the following method should work:
import random
max_id = MyModel.objects.last().id
random_id = random.randint(0, max_id)
random_obj = MyModel.objects.get(pk=random_id)
This method is much more efficient than other methods here that iterate through all rows of the table. While it does require two database queries, both are trivial. Furthermore, it’s simple and doesn’t require defining any extra classes. However, it’s applicability is limited to tables with an auto-incrementing primary key where rows have never deleted, such that there are no gaps in the sequence of ids.
In the case where rows have been deleted such that are gaps, this method could still work if it is retried until an existing primary key is randomly selected.
References
- [Django]-Substring in a django template?
- [Django]-Django create userprofile if does not exist
- [Django]-Django-allauth: Linking multiple social accounts to a single user
2👍
In addition to all the different answers to this question, please remember the right solution depends on your use case.
Method 1 (big DB, no deletions, pk starts from 1)
If the records on your database have sequential primary keys with no gaps (if you never delete objects from your database):
Select one random object:
import random
# Get the biggest primary key
# (generally ):
pk_max= MyModel.objects.all().aggregate(pk_max=Max("pk"))['pk_max']
# Pick a random primary key:
random_pk = random.randint(1, pk_max)
# Get that object:
random_object = MyModel.objects.get(pk=random_pk )
Select n random objects:
import random
n = 5
pk_max= MyModel.objects.all().aggregate(pk_max=Max("pk"))['pk_max']
random_pk_list = random.sample(range(1,pk_max+1), n)
random_objects = MyModel.objects.filter(pk__in=random_pk_list)
# Note: this one returns a QuerySet
Problem: Will start crying if you ever have a gap in your primary key column. It makes a huge assumption about the future of your database for probably a very critical operation.
Method 2 (small DB, no deletions, pk starts from 1)
(Note: both A and B are not efficient at all. Only use these if you have to. If your DB table is small enough to do this, maybe you don’t even need a table for this and can use something like enums, etc.)
A. By randomly ordering the ENTIRE table with SQL:
# Get one random object:
random_object = MyModel.objects.order_by('?').first()
# Get n Random objects:
random_objects = MyModel.objects.order_by('?')[:n]
Problem: It uses SQL’s ORDER BY RAND()
, which may go on your criminal record as a misdemeanor in certain countries in the near future. Also there is a very good chance that it will generate a new temporary table and random numbers for every single record before ordering them based on these random numbers. See this answer for a great technical explanation.
B. By loading ALL the objects from the table first:
import random
# Don't do this!
every_single_object_ever = MyModel.objects.all()
# Get one random object:
random_object = random.choice(every_single_object_ever)
# Get n Random objects:
random_objects = random.choices(every_single_object_ever, k=n)
Problem: You are loading all the objects from your table as Django’s MyModel instances to get a single(or several) record.
(Tbh ‘every_single_object_ever is a queryset until you use it in the random.choice method’)
You can improve this a little bit;
C. By loading ALL the ids from the table first:
Remember that this solution is still terrible, and now it uses two queries instead of one:
import random
every_single_pk_ever = MyModel.objects.values_list('id', flat=True)
# Get one random object:
random_object = random.choice(every_single_pk_ever )
# Get n Random objects:
random_objects = random.choices(every_single_pk_ever , k=n)
Problem: Loads every single pk in the table just to get some of them.
Method 3 (big DB with deletions(or gaps), pk is an integer)
Method 1 assumes that you have sequential primary keys with no gaps. We can get around this using SQL’s OFFSET X LIMIT Y
. So instead of defining the primary keys, you will specify the index of the row.
This is almost identical to Emil Ivanov’s answer:
# first query:
count = MyModel.objects.all().count()
# second query:
random_offset = random.randint(0,count-1)
MyModel.objects.all()[random_offset].get()
This works efficiently because Django uses python’s slicing to modify the query’s LIMIT and OFFSET values. See the docs for more on this.
Problem: Since this one uses two queries, total count of the records may change during this period, as Nelo Mitranim stated before. This may throw an exception. Notice how I used .get() at the end. This is to make sure the code throws a DoesNotExist exception if anything goes wrong. Without the .get(), it would throw an IndexError instead.
For multiple objects, you could do MyModel.objects.all()[random_offset-n:random_offset]
but this would make your ‘random’ objects have sequential row numbers, which means it is more of a random slice from your database. For a list of completely random objects, you would need to get random.sample() between 1 and your count, and run the MyModel.objects.all()[random_offset]
line in a for loop. Which gets quite painful if you need lets say 50 random objects. Which means:
We need another solution for gathering multiple random objects.
Let’s say we need 3 random objects. What happens if I do this?
pk_max= MyModel.objects.all().aggregate(pk_max=Max("pk"))['pk_max']
random_pk_list = random.sample(range(1,pk_max+1), 10)
random_objects = MyModel.objects.filter(pk__in=random_pk_list)[:3]
It translates to this:
SELECT * FROM `myapp_mymodel` WHERE id IN (1, 3, 7, 8, 11, 13, 17, 21, 25, 29) LIMIT 3
I only need 3 objects. But I am giving it a list of 10 possible primary keys to choose from. This lowers my chances of hitting a gap and ending up with zero, one or two objects. Performance? Well, since your database engine stops looking for more records when it hits the LIMIT, generating random numbers at the beginning is the only performance sacrifice you are making. Which is not much. If you are positive that you have enough records, you can send a thousand possible primary keys instead:
pk_max= MyModel.objects.all().aggregate(pk_max=Max("pk"))['pk_max']
random_pk_list = random.sample(range(1,pk_max+1), 1000)
random_objects = MyModel.objects.filter(pk__in=random_pk_list)[:3]
Method 4 (Database specific SQL queries, aka getting your hands dirty)
You can always execute raw queries with Django. This might be as simple as this for some database engines:
random_object_list = MyModel.objects.raw('SELECT * FROM myapp_mymodel LIMIT 1 OFFSET TRUNCATE(RANDOM() * (SELECT COUNT(*) FROM myapp_mymodel))')
But most of them require a constant for the OFFSET. A working alternative for the Mysql can be found below(Note: this is an overkill). It locks the table. Sets a random offset based on the count of objects in our table to a variable, and executes the Select query with this random OFFSET value.
from django.db import connection
with connection.cursor():
cursor.execute('LOCK TABLE myapp_mymodel WRITE;')
cursor.execute("SET @randomoffset=CAST(truncate((SELECT COUNT(*) FROM
myapp_mymodel)*RAND(),0)+0 AS SIGNED);")
cursor.execute("PREPARE STMT FROM 'SELECT * FROM myapp_mymodel LIMIT 1 OFFSET ?';")
cursor.execute("EXECUTE STMT USING @randomoffset;")
row = cursor.fetchone()
cursor.execute('UNLOCK TABLES;')
print(row)
You can still increase the performance. Two naive solutions I can think of are importing MySQLDb directly and executing above commands in a single cursor.execute() without the ugly table lock, or using ROWID(if supported) in a where clause instead of the OFFSET since high OFFSETs can cause performance issues.
Final note:
Since it is easier to ask for forgiveness than permission, I’d say just go with method 3, and if you encounter an error, try it again. Do it in a while loop if you really need it, most of the time the query will only need to run once.
- [Django]-How do I clone a Django model instance object and save it to the database?
- [Django]-Django: sqlite for dev, mysql for prod?
- [Django]-Printing Objects in Django
1👍
This is Highly recomended Getting a random row from a relational database
Because using django orm to do such a thing like that, will makes your db server angry specially if you have big data table 😐
And the solution is provide a Model Manager and write the SQL query by hand 😉
Update:
Another solution which works on any database backend even non-rel ones without writing custom ModelManager
. Getting Random objects from a Queryset in Django
- [Django]-Django: For Loop to Iterate Form Fields
- [Django]-Django-allauth social account connect to existing account on login
- [Django]-Django REST Framework: adding additional field to ModelSerializer
1👍
You may want to use the same approach that you’d use to sample any iterator, especially if you plan to sample multiple items to create a sample set. @MatijnPieters and @DzinX put a lot of thought into this:
def random_sampling(qs, N=1):
"""Sample any iterable (like a Django QuerySet) to retrieve N random elements
Arguments:
qs (iterable): Any iterable (like a Django QuerySet)
N (int): Number of samples to retrieve at random from the iterable
References:
@DZinX: https://stackoverflow.com/a/12583436/623735
@MartinPieters: https://stackoverflow.com/a/12581484/623735
"""
samples = []
iterator = iter(qs)
# Get the first `N` elements and put them in your results list to preallocate memory
try:
for _ in xrange(N):
samples.append(iterator.next())
except StopIteration:
raise ValueError("N, the number of reuested samples, is larger than the length of the iterable.")
random.shuffle(samples) # Randomize your list of N objects
# Now replace each element by a truly random sample
for i, v in enumerate(qs, N):
r = random.randint(0, i)
if r < N:
samples[r] = v # at a decreasing rate, replace random items
return samples
- [Django]-How can I create a deep clone of a DB object in Django?
- [Django]-Django count RawQuerySet
- [Django]-Django select_for_update cannot be used outside of a transaction
1👍
One much easier approach to this involves simply filtering down to the recordset of interest and using random.sample
to select as many as you want:
from myapp.models import MyModel
import random
my_queryset = MyModel.objects.filter(criteria=True) # Returns a QuerySet
my_object = random.sample(my_queryset, 1) # get a single random element from my_queryset
my_objects = random.sample(my_queryset, 5) # get five random elements from my_queryset
Note that you should have some code in place to verify that my_queryset
is not empty; random.sample
returns ValueError: sample larger than population
if the first argument contains too few elements.
- [Django]-Changing a project name in django
- [Django]-Does SQLAlchemy have an equivalent of Django's get_or_create?
- [Django]-Django manage.py runserver invalid syntax
0👍
Fast way via native SQL.
tested with MySQL 8.x and Django 4.x
I pull x random set of record id's
from the table. With these id’s I fetch the objects.
Example:
sql = "SELECT id FROM paintings ORDER BY RAND() LIMIT 1"
list_of_ids = [p.id for p in Painting.objects.raw(sql)]
list_of_random_paintings = Painting.objects.filter(id__in=list_of_ids).all()
- via a optional WHERE you can change the scope of your query
- LIMIT is set to 1, if you need more records just change it
Quick benchmark, on:
- 107.251 records ~ 0.13 seconds
- 931.103 records ~ 0.8 seconds
- 8.044,965 records ~ 3.4 seconds
The bigger the table, the longer it takes. After all, using this depends on your use-case.
- [Django]-Celery : Execute task after a specific time gap
- [Django]-Has Django served an excess of 100k daily visits?
- [Django]-Django: How to format a DateField's date representation?
0👍
This is in case you want to pull a LIST of random records. Pretty useful when you need to show only a random sample of a category.
import random
all_cat_objects = Model.objects.filter(category='category-name')
sample_objects = random.choices(all_cat_objects, k=10) # only select 10
If the database is too big, get random datetime or pk field ranges, use WHERE clause and pick from them in python.
- [Django]-Referencing multiple submit buttons in django
- [Django]-How to assign items inside a Model object with Django?
- [Django]-Django url tag multiple parameters
-1👍
I got very simple solution, make custom manager:
class RandomManager(models.Manager):
def random(self):
return random.choice(self.all())
and then add in model:
class Example(models.Model):
name = models.CharField(max_length=128)
objects = RandomManager()
Now, you can use it:
Example.objects.random()
- [Django]-Function decorators with parameters on a class based view in Django
- [Django]-VueJS + Django Channels
- [Django]-Django Form File Field disappears on form error