[Django]-How to encode UTF8 filename for HTTP headers? (Python, Django)

35👍

This is a FAQ.

There is no interoperable way to do this. Some browsers implement proprietary extensions (IE, Chrome), other implement RFC 2231 (Firefox, Opera).

See test cases at http://greenbytes.de/tech/tc2231/.

Update: as of November 2012, all current desktop browsers support the encoding defined in RFC 6266 and RFC 5987 (Safari >= 6, IE >= 9, Chrome, Firefox, Opera, Konqueror).

32👍

Don’t send a filename in Content-Disposition. There is no way to make non-ASCII header parameters work cross-browser(*).

Instead, send just “Content-Disposition: attachment”, and leave the filename as a URL-encoded UTF-8 string in the trailing (PATH_INFO) part of your URL, for the browser to pick up and use by default. UTF-8 URLs are handled much more reliably by browsers than anything to do with Content-Disposition.

(*: actually, there’s not even a current standard that says how it should be done as the relationships between RFCs 2616, 2231 and 2047 are pretty dysfunctional, something that Julian is trying to get cleared up at a spec level. Consistent browser support is in the distant future.)

31👍

Note that in 2011, RFC 6266 (especially Appendix D) weighed in on this issue and has specific recommendations to follow.

Namely, you can issue a filename with only ASCII characters, followed by filename* with a RFC 5987-formatted filename for those agents that understand it.

Typically this will look like filename="my-resume.pdf"; filename*=UTF-8''My%20R%C3%A9sum%C3%A9.pdf, where the Unicode filename ("My Résumé.pdf") is encoded into UTF-8 and then percent-encoded (note, do NOT use + for spaces).

Please do actually read RFC 6266 and RFC 5987 (or use a robust and tested library that abstracts this for you), as my summary here is lacking in important detail.

17👍

Starting with Django 2.1 (see issue #16470), you can use FileResponse, which will correctly set the Content-Disposition header for attachments. Starting with Django 3.0 (issue #30196) it will also set it correctly for inline files.

For example, to return a file named my_img.jpg with MIME type image/jpeg as an HTTP response:

response = FileResponse(open("my_img.jpg", 'rb'), as_attachment=True, content_type="image/jpeg")
return response

Or, if you can’t use FileResponse, you can use the relevant part from FileResponse‘s source to set the Content-Disposition header yourself. Here’s what that source currently looks like:

from urllib.parse import quote

disposition = 'attachment' if as_attachment else 'inline'
try:
    filename.encode('ascii')
    file_expr = 'filename="{}"'.format(filename)
except UnicodeEncodeError:
    file_expr = "filename*=utf-8''{}".format(quote(filename))
response.headers['Content-Disposition'] = '{}; {}'.format(disposition, file_expr)

10👍

I can say that I’ve had success using the newer (RFC 5987) format of specifying a header encoded with the e-mail form (RFC 2231). I came up with the following solution which is based on code from the django-sendfile project.

import unicodedata
from django.utils.http import urlquote

def rfc5987_content_disposition(file_name):
    ascii_name = unicodedata.normalize('NFKD', file_name).encode('ascii','ignore').decode()
    header = 'attachment; filename="{}"'.format(ascii_name)
    if ascii_name != file_name:
        quoted_name = urlquote(file_name)
        header += '; filename*=UTF-8\'\'{}'.format(quoted_name)

    return header

# e.g.
  # request['Content-Disposition'] = rfc5987_content_disposition(file_name)

I have only tested my code on Python 3.4 with Django 1.8. So the similar solution in django-sendfile may suite you better.

There’s a long standing ticket in Django’s tracker which acknowledges this but no patches have yet been proposed afaict. So unfortunately this is as close to using a robust tested library as I could find, please let me know if there’s a better solution.

👤Will S

2👍

The escape_uri_path function from Django is the solution that worked for me.

Read the Django Docs here to see which RFC standards are currently specified.

from django.utils.encoding import escape_uri_path

file = "response.zip"
response = HttpResponse(content_type='application/zip')
response['Content-Disposition'] = f"attachment; filename*=utf-8''{escape_uri_path(file)}"

-1👍

A hack:

if (Request.UserAgent.Contains("IE"))
{
  // IE will accept URL encoding, but spaces don't need to be, and since they're so common..
  filename = filename.Replace("%", "%25").Replace(";", "%3B").Replace("#", "%23").Replace("&", "%26");
}
👤anon

Leave a comment