[Answered ]-Why am I getting 'cuMemAlloc failed: not initialized' even though I am initializing correctly?

2👍

I believe the problem you experience is related to CUDA contexts. As of CUDA 4.0 a CUDA context is required per process and per device.

Behind the scenes celery will spawn processes for the task workers. When a process/task starts it will not have a context available. In pyCUDA the context creation happens in the autoinit module. That’s why your code will work if you run it as a standalone (no extra process is created and the context is valid) or if you put the import autoinit inside the CUDA task (Now the process/task will have a context, I believe you tried that already).

If you want to avoid the import you may be able to use the make_default_context from pycuda.tools although I’m not very familiar with pyCUDA and how it handles context management.

from pycuda.tools import make_default_context

@task()
def photo_function(photo_id,...):
  ctx = make_default_context()
  print 'Got photo...'
  ... Do some stuff ...
  result = do_photo_manipulation(photo_id)
  return result

Beware that context creation is an expensive process. CUDA deliberately front loads a lot of work in the context in order to avoid non expected delays later on. That’s why you have a stack of contexts that you can push/pop between host threads (but not between processes). If your kernel code is very fast you may experience delays because of the context create/destroy procedure.

👤alexT

Leave a comment