[Django]-Python: Memory leak debugging

32👍

✅

See http://opensourcehacker.com/2008/03/07/debugging-django-memory-leak-with-trackrefs-and-guppy/ . Short answer: if you’re running django but not in a web-request-based format, you need to manually run db.reset_queries() (and of course have DEBUG=False, as others have mentioned). Django automatically does reset_queries() after a web request, but in your format, that never happens.

21👍

Is DEBUG=False in settings.py?

If not Django will happily store all the SQL queries you make which adds up.

6👍

Have you tried gc.set_debug() ?

You need to ask yourself simple questions:

  • Am I using objects with __del__ methods? Do I absolutely, unequivocally, need them?
  • Can I get reference cycles in my code? Can’t we break these circles before getting rid of the objects?

See, the main issue would be a cycle of objects containing __del__ methods:

import gc

class A(object):
    def __del__(self):
        print 'a deleted'
        if hasattr(self, 'b'):
            delattr(self, 'b')

class B(object):
    def __init__(self, a):
        self.a = a
    def __del__(self):
        print 'b deleted'
        del self.a


def createcycle():
    a = A()
    b = B(a)
    a.b = b
    return a, b

gc.set_debug(gc.DEBUG_LEAK)

a, b = createcycle()

# remove references
del a, b

# prints:
## gc: uncollectable <A 0x...>
## gc: uncollectable <B 0x...>
## gc: uncollectable <dict 0x...>
## gc: uncollectable <dict 0x...>
gc.collect()

# to solve this we break explicitely the cycles:
a, b = createcycle()
del a.b

del a, b

# objects are removed correctly:
## a deleted
## b deleted
gc.collect()

I would really encourage you to flag objects / concepts that are cycling in your application and focus on their lifetime: when you don’t need them anymore, do we have anything referencing it?

Even for cycles without __del__ methods, we can have an issue:

import gc

# class without destructor
class A(object): pass

def createcycle():
    # a -> b -> c 
    # ^         |
    # ^<--<--<--|
    a = A()
    b = A()
    a.next = b
    c = A()
    b.next = c
    c.next = a
    return a, b, b

gc.set_debug(gc.DEBUG_LEAK)

a, b, c = createcycle()
# since we have no __del__ methods, gc is able to collect the cycle:

del a, b, c
# no panic message, everything is collectable:
##gc: collectable <A 0x...>
##gc: collectable <A 0x...>
##gc: collectable <dict 0x...>
##gc: collectable <A 0x...>
##gc: collectable <dict 0x...>
##gc: collectable <dict 0x...>
gc.collect()

a, b, c = createcycle()

# but as long as we keep an exterior ref to the cycle...:
seen = dict()
seen[a] = True

# delete the cycle
del a, b, c
# nothing is collected
gc.collect()

If you have to use “seen”-like dictionaries, or history, be careful that you keep only the actual data you need, and no external references to it.

I’m a bit disappointed now by set_debug, I wish it could be configured to output data somewhere else than to stderr, but hopefully that should change soon.

6👍

See this excellent blog post from Ned Batchelder on how they traced down real memory leak in HP’s Tabblo. A classic and worth reading.

1👍

I think you should use different tools. Apparently, the statistics you got is only about GC objects (i.e. objects which may participate in cycles); most notably, it lacks strings.

I recommend to use Pympler; this should provide you with more detailed statistics.

1👍

Do you use any extension? They are a wonderful place for memory leaks, and will not be tracked by python tools.

0👍

Try Guppy.

Basicly, you need more information or be able to extract some. Guppy even provides graphical representation of data.

Leave a comment