[Django]-Loading a pickled list just once – Django\Python

3👍

How you can do it

Create a module called cache.py, then:

import cache
data = getattr(cache, 'data', '') or get_my_data()

This will reload the data only once by server process (which will depend on your setup, your web server and wherever you use WSGI or CGI). In the dev web server (./manage.py runserver), every time you will modify a file, the cache will be invalidated.

How it works

Modules in Python are imported only once for each Python process. If you use import several times, it will only return a reference to the already imported module. So if you have an Apache running mod_wsgi with 4 workers, get_my_data() will be called only 4 times as there are only 4 Python processes running. Remember that worker can die, be reloaded, be killed, etc. But it should keep calls to get_my_data() to a minimum.

Gotcha: if one process modifies the cache data, others won’t know about it. If your data is meant to be static, it’s ok. If you need to keep it up to date, it won’t work. It’s true for this method or any method implying the use of a singleton, unless you can ensure you have only one Python process running (which you can, but this is not the purpose of this answer).

About the syntax:

getattr(cache, 'data', '') return the attribute with the name ‘data’ of the object ‘cache’. If it doesn’t exist, it returns the last parameters, here an empty string.

In Python, or is lazy and will stop evaluating parameters if it can return. In our case, if ‘data’ is an attribute of cache, it will be True in a boolean context, or will consider that it already did it’s job (as it needs only one value to be True to return True) and will return True without running get_my_data(). However, if ‘data’ is not an attribute of cache, then if or will evaluate an empty string, consider it as False, then run get_my_data().

Why you probably don’t want to do it anyway

  1. If you load for every page of you website something that take 2 seconds to generate for each request, something is wrong. You may want to rethink your architecture.
  2. If the data is not meant to return value, but rather run a process after a user action, then it’s probably better to run an asynchronous function, using tools such as Celery.
  3. The re module caches regex anyway, so you probably don’t need to compile them anymore. The other data probably can be expressed as primitive. Store all of them as strings and other primitives in a cache backend such as memcached or redis, it’s going to be much cleaner. Plus, if one Python processes update the cache, then the others will be aware of it. They wont with the above snippet.

Last word about settings.py

You should not put in in the settings.py file:

  • If you hardcode it, you settings file is going to be unreadable, and annoying to put in a source control tool.
  • You can’t put it here dynamically as the settings module is read only in Django, unless you use some ugly hacks, that can lead to unexpected problems.

2👍

I’d write a python module – a singleton class with an init method that reads the pickled data into a python object, and then whatever ‘get’ methods you need to get the info out.

Then in your settings.py you just call the initialisation method. Anything that needs to get info from it just imports the module and uses the get methods.

1👍

You could load it in and then use the django cacheing framework to store it, that way it would only be loaded once.

http://docs.djangoproject.com/en/dev/topics/cache/

Leave a comment