[Django]-Asynchronous File Upload to Amazon S3 with Django

I’ve taken another approach to this problem.

My models have 2 file fields, one uses the standard file storage backend and the other one uses the s3 file storage backend. When the user uploads a file it get’s stored localy.

I have a management command in my application that uploads all the localy stored files to s3 and updates the models.

So when a request comes for the file I check to see if the model object uses the s3 storage field, if so I send a redirect to the correct url on s3, if not I send a redirect so that nginx can serve the file from disk.

This management command can ofcourse be triggered by any event a cronjob or whatever.

Vasil

[Django]-How do I turn MongoDB query into a JSON?

It’s possible to have your users upload files directly to S3 from their browser using a special form (with an encrypted policy document in a hidden field). They will be redirected back to your application once the upload completes.

More information here: http://developer.amazonwebservices.com/connect/entry.jspa?externalID=1434

Simon Willison

There is an app for that

https://github.com/jezdez/django-queued-storage

It does exactly what you need – and much more, because you can set any “local” storage and any “remote” storage. This app will store your file in fast “local” storage (for example MogileFS storage) and then using Celery (django-celery), will attempt asynchronous uploading to the “remote” storage.

Few remarks:

The tricky thing is – you can setup it to copy&upload, or to upload&delete strategy, that will delete local file once it is uploaded.
Second tricky thing – it will serve file from “local” storage until it is not uploaded.
It also can be configured to make number of retries on uploads failures.

Installation & usage is also very simple and straightforward:

pip install django-queued-storage

append to INSTALLED_APPS:

INSTALLED_APPS += ('queued_storage',)

in models.py:

from queued_storage.backends import QueuedStorage
queued_s3storage = QueuedStorage(
    'django.core.files.storage.FileSystemStorage',
    'storages.backends.s3boto.S3BotoStorage', task='queued_storage.tasks.TransferAndDelete')

class MyModel(models.Model):
    my_file = models.FileField(upload_to='files', storage=queued_s3storage)

thedk

You could decouple the process:

the user selects file to upload and sends it to your server. After this he sees a page “Thank you for uploading foofile.txt, it is now stored in our storage backend”
When the users has uploaded the file it is stored temporary directory on your server and, if needed, some metadata is stored in your database.
A background process on your server then uploads the file to S3. This would only possible if you have full access to your server so you can create some kind of “deamon” to to this (or simply use a cronjob).*
The page that is displayed polls asynchronously and displays some kind of progress bar to the user (or s simple “please wait” Message. This would only be needed if the user should be able to “use” (put it in a message, or something like that) it directly after uploading.

[*: In case you have only a shared hosting you could possibly build some solution which uses an hidden Iframe in the users browser to start a script which then uploads the file to S3]

Martin Thurau

You can directly upload media to the s3 server without using your web application server.

See the following references:

Amazon API Reference : http://docs.amazonwebservices.com/AmazonS3/latest/dev/index.html?UsingHTTPPOST.html

A django implementation : https://github.com/sbc/django-uploadify-s3

digitalPBK

As some of the answers here suggest uploading directly to S3, here’s a Django S3 Mixin using plupload:
https://github.com/burgalon/plupload-s3mixin

Alon Burg

I encountered the same issue with uploaded images. You cannot pass along files to a Celery worker because Celery needs to be able to pickle the arguments to a task. My solution was to deconstruct the image data into a string and get all other info from the file, passing this data and info to the task, where I reconstructed the image. After that you can save it, which will send it to your storage backend (such as S3). If you want to associate the image with a model, just pass along the id of the instance to the task and retrieve it there, bind the image to the instance and save the instance.

When a file has been uploaded via a form, it is available in your view as a UploadedFile file-like object. You can get it directly out of request.FILES, or better first bind it to your form, run is_valid and retrieve the file-like object from form.cleaned_data. At that point at least you know it is the kind of file you want it to be. After that you can get the data using read(), and get the other info using other methods/attributes. See https://docs.djangoproject.com/en/1.4/topics/http/file-uploads/

I actually ended up writing and distributing a little package to save an image asyncly. Have a look at https://github.com/gterzian/django_async Right it’s just for images and you could fork it and add functionalities for your situation. I’m using it with https://github.com/duointeractive/django-athumb and S3

gterzian

Source:stackexchange.com

Leave a comment Cancel reply