May 31, 2012

Querysets aren't as lazy as you think

http://blog.roseman.org.uk/2012/05/30/querysets-arent-as-lazy-as-you-think/

It should be reasonably well-known by now that querysets are lazy. That is, simply instantiating a queryset via a manager doesn't actually hit the database: that doesn't happen until the queryset is sliced or iterated. That's why the first field definition in the form below is safe, but not the second:
class MyForm(forms.Form):
    my_field = forms.ModelChoiceField(queryset=MyModel.objects.all())
    my_datefield = forms.DateTimeField(initial=datetime.datetime.now())
Even though both elements are calling methods on definition, the first is safe because the queryset is not evaluated at that time, whereas the second is not safe because it is evaluated at that time, and therefore remains the same for the duration of the current process (which can be many days). For the record, you should always pass the callable: initial=datetime.datetime.now, without the calling brackets.

Now, there are a couple of gotchas here. It is perfectly possible to define manager methods that are not safe to use in places like the queryset argument to the first field above. Here's an example:
class PublishedManager(models.Manager):
    def get_query_set(self):
        return super(PublishedManager, self).get_query_set().filter(
                published_date__lte=datetime.datetime.now())
Clearly, this is an attempt to create a manager that automatically filters items which have been published. In the normal course of calling this in a view, it will work exactly as expected. But if you passed it into the queryset parameter of the form field, the same thing will happen as with the date field: the cut-off point will always be set when the form is first imported, and will persist for the life of the process.

This is because there's nothing magical about manager methods that makes them lazy. The laziness comes further down, inside the QuerySet class itself. This method, which is called automatically by the all() method, will be evaluated when it is called - ie when the form is first defined. At that point, the right-hand-side of the query expression will also be evaluated, and passed into the main Manager get_query_set method. So no matter when you instantiate your form after this, during the lifetime of the process you will never see any objects whose published_date is greater than the first time.

But note that if you change the published_date of an existing object to before that time - or even create a new object with that date - you will see it. The queryset is still lazy, and the database will be queried each time the form is instantiated: but that published_date parameter is fixed.

No comments: