3👍
How you can do it
Create a module called cache.py, then:
import cache
data = getattr(cache, 'data', '') or get_my_data()
This will reload the data only once by server process (which will depend on your setup, your web server and wherever you use WSGI or CGI). In the dev web server (./manage.py runserver
), every time you will modify a file, the cache will be invalidated.
How it works
Modules in Python are imported only once for each Python process. If you use import
several times, it will only return a reference to the already imported module. So if you have an Apache running mod_wsgi with 4 workers, get_my_data()
will be called only 4 times as there are only 4 Python processes running. Remember that worker can die, be reloaded, be killed, etc. But it should keep calls to get_my_data()
to a minimum.
Gotcha: if one process modifies the cache data, others won’t know about it. If your data is meant to be static, it’s ok. If you need to keep it up to date, it won’t work. It’s true for this method or any method implying the use of a singleton, unless you can ensure you have only one Python process running (which you can, but this is not the purpose of this answer).
About the syntax:
getattr(cache, 'data', '')
return the attribute with the name ‘data’ of the object ‘cache’. If it doesn’t exist, it returns the last parameters, here an empty string.
In Python, or
is lazy and will stop evaluating parameters if it can return. In our case, if ‘data’ is an attribute of cache, it will be True
in a boolean context, or
will consider that it already did it’s job (as it needs only one value to be True
to return True
) and will return True
without running get_my_data()
. However, if ‘data’ is not an attribute of cache, then if or
will evaluate an empty string, consider it as False
, then run get_my_data()
.
Why you probably don’t want to do it anyway
- If you load for every page of you website something that take 2 seconds to generate for each request, something is wrong. You may want to rethink your architecture.
- If the data is not meant to return value, but rather run a process after a user action, then it’s probably better to run an asynchronous function, using tools such as Celery.
- The
re
module caches regex anyway, so you probably don’t need to compile them anymore. The other data probably can be expressed as primitive. Store all of them as strings and other primitives in a cache backend such as memcached or redis, it’s going to be much cleaner. Plus, if one Python processes update the cache, then the others will be aware of it. They wont with the above snippet.
Last word about settings.py
You should not put in in the settings.py file:
- If you hardcode it, you settings file is going to be unreadable, and annoying to put in a source control tool.
- You can’t put it here dynamically as the settings module is read only in Django, unless you use some ugly hacks, that can lead to unexpected problems.
2👍
I’d write a python module – a singleton class with an init method that reads the pickled data into a python object, and then whatever ‘get’ methods you need to get the info out.
Then in your settings.py you just call the initialisation method. Anything that needs to get info from it just imports the module and uses the get methods.
- [Django]-Json parsing django rest framework
- [Django]-Django runserver error while loading shared libraries: libssl.so.0.9.8: cannot open shared object file: No such file or directory
- [Django]-Accessing field from OneToOne field in ModelForm
- [Django]-Django 1.4 primary key conflict after a bulk_create
- [Django]-Django: Using 2 different AdminSite instances with different models registered
1👍
You could load it in and then use the django cacheing framework to store it, that way it would only be loaded once.
- [Django]-Add reply to address to django EmailMultiAlternatives
- [Django]-Cannot convert Django Queryset to a List