[Django]-Right way to delay file download in Django

4👍

One solution less fancy than using celery is to use is Django’s StreamingHttpResponse:

(https://docs.djangoproject.com/en/2.0/ref/request-response/#django.http.StreamingHttpResponse

With this, you use a generator function, which is a python function that uses yield to return its results as an iterator. This allows you to return the data as you generate it, rather than all at once at after you’re finished. You can yield after each line or section of the report.. thus keeping a flow of data back to the browser.

But.. this only works if you are building up the finished file bit by bit.. for example, a CSV file. If you’re returning something that you need to format all at once, for example if you’re using something like wkhtmltopdf to generate a pdf file after you’re done, then it’s not as easy.

But there’s still a solution:

What you can do in that case is, use StreamingHttpReponse along with a generator function to generate your report into a temporary file, instead of back to the browser. But as you are doing this, yield HTML snippets back to the browser which lets the user know the progress, eg:

def get(self, request, **kwargs):

    # first you need a tempfile name.. do that however you like
    tempfile = "kfjkdsjfksjfks"
    # then you need to create a view which will open that file and serve it
    # but I won't show that here.  
    # For security reasons it has to serve only out of one directory 
    # that is dedicated to this.
    fetchurl = reverse('reportgetter_url') + '?file=' + tempfile

    def reportgen():
        yield 'Starting report generation..<br>'
        # do some stuff to generate your report into the tempfile
        yield 'Doing this..<br>'
        # do this
        yield 'Doing that..<br>'
        # do that
        yield 'Finished.<br>'
        # when the browser receives this script, it'll go to fetchurl where
        # you will send them the finished report.
        yield '<script>document.location="%s";</script>' % fetchurl

    return http.StreamingHttpResponse(reportgen())

That’s not a complete example obviously, but should give you the idea.

When your user fetches this view, they will see the progress of the report as it comes along. At the end, you’re sending the javacript which redirect the browser to the other view you will have to write which returns the response containing the finished file. When the browser gets this javacript, if the view returning the tempfile is setting the response Content-Disposition as an attachment before returning it, eg:

response['Content-Disposition'] = 'attachment; filename="%s"' % filename

..then the browser will stay on the current page showing your progress.. and simply pop up a file save dialog for the user.

For cleanup, you’ll need a cron job regardless.. because if people don’t wait around, they’ll never pick up the report. Sometimes things don’t work out… So you could just clean up files older than let’s say 1 hour. For a lot of systems this is acceptable.

But if you want to clean up right away, what you can do, if you are on unix/linux, is to use an old unix filesystem trick: Files which are deleted while they are open are not really gone until they are closed. So, open your tempfile.. then delete it. Then return your response. As soon as the response has finished sending, the space used by the file will be freed.

PS: I should add.. that if you take this second approach, you could use one view to do both jobs.. just:

if `file` in request.GET:
    # file= was in the url.. they are trying to get an already generated report
    with open(thepathname) as f:
        os.unlink(f)
        # file has been 'deleted' but f is still a valid open file
        response = HttpResponse( etc etc etc)
        response['Content-Disposition'] = 'attachment; filename="thereport"'
        response.write(f)
        return response
else:
   # generate the report
   # as above

2👍

This is not really a Django question but a general architecture question.

You can always increase your server time outs but it would still, IMO, give you a bad user experience if the user has to sit watching the browser just spin.

Doing this on a background task is the only way to do it right. I don’t know how large the reports are, but using email can be a good solution. The background task simply generates the report, sends it via email and deletes it.

If the files are too large to send via email, then you will have to store them. Maybe send an email with a link and a message indicating the link will not work after X days/hours. Once you have a background worker, creating a daily or hourly clean up task would be very easy.

Hope it helps

Leave a comment