Python / Django: May 2012

May 31, 2012

Querysets aren't as lazy as you think

http://blog.roseman.org.uk/2012/05/30/querysets-arent-as-lazy-as-you-think/

It should be reasonably well-known by now that querysets are lazy. That is, simply instantiating a queryset via a manager doesn't actually hit the database: that doesn't happen until the queryset is sliced or iterated. That's why the first field definition in the form below is safe, but not the second:

class MyForm(forms.Form):
    my_field = forms.ModelChoiceField(queryset=MyModel.objects.all())
    my_datefield = forms.DateTimeField(initial=datetime.datetime.now())

Even though both elements are calling methods on definition, the first is safe because the queryset is not evaluated at that time, whereas the second is not safe because it is evaluated at that time, and therefore remains the same for the duration of the current process (which can be many days). For the record, you should always pass the callable: initial=datetime.datetime.now, without the calling brackets.

Now, there are a couple of gotchas here. It is perfectly possible to define manager methods that are not safe to use in places like the queryset argument to the first field above. Here's an example:

class PublishedManager(models.Manager):
    def get_query_set(self):
        return super(PublishedManager, self).get_query_set().filter(
                published_date__lte=datetime.datetime.now())

Clearly, this is an attempt to create a manager that automatically filters items which have been published. In the normal course of calling this in a view, it will work exactly as expected. But if you passed it into the queryset parameter of the form field, the same thing will happen as with the date field: the cut-off point will always be set when the form is first imported, and will persist for the life of the process.

This is because there's nothing magical about manager methods that makes them lazy. The laziness comes further down, inside the QuerySet class itself. This method, which is called automatically by the all() method, will be evaluated when it is called - ie when the form is first defined. At that point, the right-hand-side of the query expression will also be evaluated, and passed into the main Manager get_query_set method. So no matter when you instantiate your form after this, during the lifetime of the process you will never see any objects whose published_date is greater than the first time.

But note that if you change the published_date of an existing object to before that time - or even create a new object with that date - you will see it. The queryset is still lazy, and the database will be queried each time the form is instantiated: but that published_date parameter is fixed.

May 25, 2012

Pretty print for standard python shell

import pprint
import sys
sys.displayhook = pprint.pprint

May 21, 2012

Static site generator

http://nikola.ralsina.com.ar/handbook.html

Services, kernel modules

Use service --status-all to see what's running on your system, and service service-name stop to shut it down.

Use lsmod to see what kernel modules are loaded, and rmmod module-name to unload it.

May 16, 2012

Allowing only super user login decorator

from django.contrib.auth.decorators import user_passes_test

@user_passes_test(lambda u: u.is_superuser)
def foo_view(request):
    ....

May 13, 2012

Building a higher-level query API: the right way to use django's ORM

Approach 2: Manager methods

class TodoManager(models.Manager):
    def incomplete(self):
        return self.filter(is_done=False)

    def high_priority(self):
        return self.filter(priority=1)

class Todo(models.Model):
    content = models.CharField(max_length=100)
    # other fields go here..

    objects = TodoManager()

Todo.objects.incomplete()
Todo.objects.high_priority()

# and (from comments)

incomplete_and_high_priority = Todo.objects.incomplete() & Todo.objects.hight_priority()

Approach 3a: copy Django, proxy everything

class TodoQuerySet(models.query.QuerySet):
    def incomplete(self):
        return self.filter(is_done=False)

    def high_priority(self):
        return self.filter(priority=1)

class TodoManager(models.Manager):
    def get_query_set(self):
        return TodoQuerySet(self.model, using=self._db)

    def incomplete(self):
        return self.get_query_set().incomplete()

    def high_priority(self):
        return self.get_query_set().high_priority()

Todo.objects.incomplete().high_priority()

Approach 3b: django-model-utils

from model_utils.managers import PassThroughManager

class TodoQuerySet(models.query.QuerySet):
    def incomplete(self):
        return self.filter(is_done=False)

    def high_priority(self):
        return self.filter(priority=1)

class Todo(models.Model):
    content = models.CharField(max_length=100)
    # other fields go here..

    objects = PassThroughManager.for_queryset_class(TodoQuerySet)()

Original: http://dabapps.com/blog/higher-level-query-api-django-orm/

Исправляем DeprecationWarning в Django 1.4

Для того, чтобы приложение падало на каждом варнинге вставить в manage.py:

import warnings
warnings.filterwarnings('error', category=DeprecationWarning)

Для того, чтобы решить проблему с урлами надо:

Добавить в шаблон тег «{% load url from future %}».
Обновить теги ссылок например «{% url myapp:view-name arg arg2 as the_url %}» должно стать «{% url 'myapp:view-name' arg arg2 as the_url %}».

http://blog.futurecolors.ru/2012/05/django-14.html

May 12, 2012

Оптимизация Django

Иногда есть смысл оптимизировать код,работающий лишь несколько миллисекунд:

Middleware
Context processors
Template tags в базовом шаблоне

Если среднее время ответа 100мс, а время работы middleware – 11мс, то снизив его до 1мс мы сможем обслуживать на 10% больше запросов.

Делайте их ленивыми

Вы не знаете наверняка, пригодится ли где-нибудь то, что вы насчитали в своем context processor’е. Поэтому middleware и context processors должны быть ленивыми!

from django.utils.functional import lazy
class LocationMiddleware(object): 
    def process_request(self, request): 
        request.location = lazy(get_location, dict)(request)

def get_location(request): 
    g = GeoIP() 
    remote_ip = request.META.get(REMOTE_ADDR) 
    return g.city(remote_ip)

http://www.slideshare.net/MoscowDjango/django-12897658 слайды 33 и 34
http://stackoverflow.com/questions/8563812/lazy-load-of-data-from-a-context-processor/8564778#8564778

Инвалидация кэша по событию с использованием декоратора

http://www.slideshare.net/MoscowDjango/django-12897658 слайд 32
gametags.py:

@register.simple_tag
@cached(vary_on_args=True, locmem=True)
def games(platform=None, genre=None):
    ...

signals.py:

@receiver(post_save, sender=Game)
def inval_games(**kwargs):
    invalidate(‘games.templatetags.gametags.games’)

http://pypi.python.org/pypi/django-cache-utils2

May 10, 2012

Versioning

django-reversion-compare - Add compare view to django-reversion for comparing two versions of a reversion model

May 2, 2012

Three things you should never put in your database

As I've said in a few talks, the best way to improve your systems is by first not doing "dumb things". I don't mean you or your development staff is "dumb", it's easy to overlook the implications of these types of decisions and not realize how bad they are for maintainability let alone scaling. As a consultant I see this stuff all of the time and I have yet to ever see it work out well for anyone.

Images, files, and binary data

Your database supports BLOBs so it must be a good idea to shove your files in there right? No it isn't! Hell it isn't even very convenient to use with many DB language bindings.

There are a few of problems with storing files in your database:
read/write to a DB is always slower than a filesystem
your DB backups grow to be huge and more time consuming
access to the files now requires going through your app and DB layers

The last two are the real killers. Storing your thumbnail images in your database? Great now you can't use nginx or another lightweight web server to serve them up.

Do yourself a favor and store a simple relative path to your files on disk in the database or use something like S3 or any CDN instead.

Ephemeral data

Usage statistics, metrics, GPS locations, session data anything that is only useful to you for a short period of time or frequently changes. If you find yourself DELETEing an hour, day, or weeks worth of some table with a cron job, you're using the wrong tool for the job.

Use redis, statsd/graphite, Riak anything else that is better suited to that type of work load. The same advice goes for aggregations of ephemeral data that doesn't live for very long.

Sure it's possible to use a backhoe to plant some tomatoes in the garden, but it's far faster to grab the shovel in the garage than schedule time with a backhoe and have it arrive at your place and dig. Use the right tool(s) for the job at hand.

Logs

This one seems ok on the surface and the "I might need to use a complex query on them at some point in the future" argument seems to win people over. Storing your logs in a database isn't a HORRIBLE idea, but storing them in the same database as your other production data is.

Maybe you're conservative with your logging and only emit one log line per web request normally. That is still generating a log INSERT for every action on your site that is competing for resources that your users could be using. Turn up your logging to a verbose or debug level and watch your production database catch on fire!

Instead use something like Splunk, Loggly or plain old rotating flat files for your logs. The few times you need to inspect them in odd ways, even to the point of having to write a bit of code to find your answers, is easily outweighed by the constant resources it puts on your system.

But wait, you're a unique snowflake and your problem is SO different that it's ok for you to do one of these three. No you aren't and no it really isn't. Trust me.

http://www.revsys.com/blog/2012/may/01/three-things-you-should-never-put-your-database/