[Django]-How does django handle multiple memcached servers?

15👍

✅

It’s the actual memcached client who does the sharding. Django only passes the configuration from settings.CACHES to the client.

The order of the servers doesn’t matter*, but (at least for python-memcached) you can specify a ‘weight’ for each of the servers:

CACHES = {
    'default': {
        'BACKEND': 'django.core.cache.backends.memcached.MemcachedCache',
        'LOCATION': [
                ('cache1.example.org:11211', 1),
                ('cache2.example.org:11211', 10),
            ],
}

I think that a quick look at memcache.py (from python-memcached) and especially memcached.Client._get_server should answer the rest of your questions:

def _get_server(self, key):
    if isinstance(key, tuple):
        serverhash, key = key
    else:
        serverhash = serverHashFunction(key)

    for i in range(Client._SERVER_RETRIES):
        server = self.buckets[serverhash % len(self.buckets)]
        if server.connect():
            #print "(using server %s)" % server,
            return server, key
        serverhash = serverHashFunction(str(serverhash) + str(i))
    return None, None

I would expect that the other memcached clients are implemented in a similar way.


Clarification by @Apreche: The order of servers does matter in one case. If you have multiple web servers, and you want them all to put the same keys on the same memcached servers, you need to configure them with the same server list in the same order with the same weights

5👍

I tested part of this and found some interesting stuff with django 1.1 and python-memcached 1.44.

On django using 2 memcache servers

cache.set('a', 1, 1000)

cache.get('a') # returned 1

I looked up which memcache server ‘a’ was sharded to using 2 other django setups each pointing at one of the memcache servers. I simulated a connectivity outage by putting up a firewall between the original django instance and the memcache server that ‘a’ was stored in.

cache.get('a') # paused for a few seconds and then returned None

cache.set('a', 2, 1000)

cache.get('a') # returned 2 right away

The memcache client library does update its sharding strategy if a server goes down.

Then I removed the firewall.

cache.get('a') # returned 2 for a bit until it detected the server back up then returned 1!

You can read stale data when a memcache server drops and comes back! Memcache doesn’t do anything clever to try to prevent this.

This can really mess things up if you’re using a caching strategy that puts things in memcache for a long time and depends on cache invalidation to handle updates. An old value can be written to the “normal” cache server for that key and if you loose connectivity and an invalidation is made during that window, when the server becomes accessible again, you’ll read stale data that you shouldn’t be able to.

One more note: I’ve been reading about some object/query caching libraries and I think johnny-cache should be immune to this problem. It doesn’t explicitly invalidate entries; instead, it changes the key at which a query is cached when a table changes. So it would never accidentally read old values.

Edit: I think my note about johnny-cache working ok is crap. http://jmoiron.net/blog/is-johnny-cache-for-you/ says “there are extra cache reads on every request to load the current generations”. If the generations are stored in the cache itself, the above scenario can cause a stale generation to be read.

đŸ‘€Dan Benamy

3👍

Thought to add this answer two years after the question was asked, since it ranks very highly in search and because we did find a situation where django was talking to only one of the memcached servers.

With a site running on django 1.4.3, python-memcached 1.51 talking to four memcached instances, we found that the database was being queried far more often than expected. Digging futher, we found that cache.get() was returning None for keys that were knew to be present in at least one of the memcached instances. When memcached was started with the -vv option it showed that the question was asked only of one server!

After a lot of hair had been pulled, we switched the backend to django.core.cache.backends.memcached.PyLibMCCache (pylibmc) and the problem went away.

đŸ‘€e4c5

2👍

If using two distinct memcache’s is ideal, django’s default implementation allows for this behavior.

First you’ll want to update your settings.py:

CACHES = {
    'default': {
        'BACKEND': 'django.core.cache.backends.memcached.MemcachedCache',
        'LOCATION': '127.0.0.1:11211',
    },
    'rusty': {
        'BACKEND': 'django.core.cache.backends.memcached.MemcachedCache',
        'LOCATION': '127.0.0.1:11222',
    }
}

Inside your django code, the default method for accessing memcache hasn’t changed. You can now use the other cache interface as follows:

from django.core.cache import cache, caches

cache.set("activity", 'great stuff', 15 ) # Default cache
caches["rusty"].set("activity", "A great time}", 32) # New rusty cache interface

The Django documentation has a great write up covering this topic: https://docs.djangoproject.com/en/dev/topics/cache/

đŸ‘€Luke Dupin

Leave a comment