Home » Python » How to pull a random record using Django's ORM?

How to pull a random record using Django's ORM?

Posted by: admin November 1, 2017 Leave a comment

Questions:

I have a model that represents paintings I present on my site. On the main webpage I’d like to show some of them: newest, one that was not visited for most time, most popular one and a random one.

I’m using Django 1.0.2.

While first 3 of them are easy to pull using django models, last one (random) causes me some trouble. I can ofc code it in my view, to something like this:

number_of_records = models.Painting.objects.count()
random_index = int(random.random()*number_of_records)+1
random_paint = models.Painting.get(pk = random_index)

It doesn’t look like something I’d like to have in my view tho – this is entirely part of database abstraction and should be in the model. Also, here I need to take care of removed records (then number of all records won’t cover me all the possible key values) and probably lots of other things.

Any other options how I can do it, preferably somehow inside the model abstraction?

Answers:

Using order_by('?') will kill the db server on the second day in production. A better way is something like what is described in Getting a random row from a relational database.

from django.db.models.aggregates import Count
from random import randint

class PaintingManager(models.Manager):
    def random(self):
        count = self.aggregate(count=Count('id'))['count']
        random_index = randint(0, count - 1)
        return self.all()[random_index]

Questions:
Answers:

Simply use:

MyModel.objects.order_by('?').first()

It is documented in QuerySet API.

Questions:
Answers:

The solutions with order_by(‘?’)[:N] are extremely slow even for medium-sized tables if you use MySQL (don’t know about other databases).

order_by('?')[:N] will be translated to SELECT ... FROM ... WHERE ... ORDER BY RAND() LIMIT N query.

It means that for every row in table the RAND() function will be executed, then the whole table will be sorted according to value of this function and then first N records will be returned. If your tables are small, this is fine. But in most cases this is a very slow query.

I wrote simple function that works even if id’s have holes (some rows where deleted):

def get_random_item(model, max_id=None):
    if max_id is None:
        max_id = model.objects.aggregate(Max('id')).values()[0]
    min_id = math.ceil(max_id*random.random())
    return model.objects.filter(id__gte=min_id)[0]

It is faster than order_by(‘?’) in almost all cases.

Questions:
Answers:

You could create a manager on your model to do this sort of thing. To first understand what a manager is, the Painting.objects method is a manager that contains all(), filter(), get(), etc. Creating your own manager allows you to pre-filter results and have all these same methods, as well as your own custom methods, work on the results.

EDIT: I modified my code to reflect the order_by['?'] method. Note that the manager returns an unlimited number of random models. Because of this I’ve included a bit of usage code to show how to get just a single model.

from django.db import models

class RandomManager(models.Manager):
    def get_query_set(self):
        return super(RandomManager, self).get_query_set().order_by('?')

class Painting(models.Model):
    title = models.CharField(max_length=100)
    author = models.CharField(max_length=50)

    objects = models.Manager() # The default manager.
    randoms = RandomManager() # The random-specific manager.

Usage

random_painting = Painting.randoms.all()[0]

Lastly, you can have many managers on your models, so feel free to create a LeastViewsManager() or MostPopularManager().

Questions:
Answers:

The other answers are either potentially slow (using order_by('?')) or use more than one SQL query. Here’s a sample solution with no ordering and just one query (assuming Postgres):

Model.objects.raw('''
    select * from {0} limit 1
    offset floor(random() * (select count(*) from {0}))
'''.format(Model._meta.db_table))[0]

Be aware that this will raise an index error if the table is empty. Write yourself a model-agnostic helper function to check for that.

Questions:
Answers:

This is Highly recomended Getting a random row from a relational database

Because using django orm to do such a thing like that, will makes your db server angry specially if you have big data table 😐

And the solution is provide a Model Manager and write the SQL query by hand 😉

Update:

Another solution which works on any database backend even non-rel ones without writing custom ModelManager. Getting Random objects from a Queryset in Django

Questions:
Answers:

Just a simple idea how I do it:

def _get_random_service(self, professional):
    services = Service.objects.filter(professional=professional)
    i = randint(0, services.count()-1)
    return services[i]

Questions:
Answers:

One much easier approach to this involves simply filtering down to the recordset of interest and using random.sample to select as many as you want:

from myapp.models import MyModel
import random

my_queryset = MyModel.objects.filter(criteria=True)  # Returns a QuerySet
my_object = random.sample(my_queryset, 1)  # get a single random element from my_queryset
my_objects = random.sample(my_queryset, 5)  # get five random elements from my_queryset

Note that you should have some code in place to verify that my_queryset is not empty; random.sample returns ValueError: sample larger than population if the first argument contains too few elements.

Questions:
Answers:

Here’s a simple solution:

from random import randint

count = Model.objects.count()
random_object = Model.objects.all()[randint(0, count - 1)] #single random object

Questions:
Answers:

Just to note a (fairly common) special case, if there is a indexed auto-increment column in the table with no deletes, the optimum way to do a random select is a query like:

SELECT * FROM table WHERE id = RAND() LIMIT 1

that assumes such a column named id for table. In django you can do this by:

Painting.objects.raw('SELECT * FROM appname_painting WHERE id = RAND() LIMIT 1')

in which you must replace appname with your application name.

In General, with an id column, the order_by(‘?’) can be done much faster with:

Paiting.objects.raw(
        'SELECT * FROM auth_user WHERE id>=RAND() * (SELECT MAX(id) FROM auth_user) LIMIT %d' 
    % needed_count)

Questions:
Answers:

You may want to use the same approach that you’d use to sample any iterator, especially if you plan to sample multiple items to create a sample set. @MatijnPieters and @DzinX put a lot of thought into this:

def random_sampling(qs, N=1):
    """Sample any iterable (like a Django QuerySet) to retrieve N random elements

    Arguments:
      qs (iterable): Any iterable (like a Django QuerySet)
      N (int): Number of samples to retrieve at random from the iterable

    References:
      @DZinX:  https://stackoverflow.com/a/12583436/623735
      @MartinPieters: https://stackoverflow.com/a/12581484/623735
    """
    samples = []
    iterator = iter(qs)
    # Get the first `N` elements and put them in your results list to preallocate memory
    try:
        for _ in xrange(N):
            samples.append(iterator.next())
    except StopIteration:
        raise ValueError("N, the number of reuested samples, is larger than the length of the iterable.")
    random.shuffle(samples)  # Randomize your list of N objects
    # Now replace each element by a truly random sample
    for i, v in enumerate(qs, N):
        r = random.randint(0, i)
        if r < N:
            samples[r] = v  # at a decreasing rate, replace random items
    return samples