[Django]-How does encoding in email subjects work? (Django/ Python)

4👍

See RFC 2047 for a complete description of the format of internationalized email headers. The basic format is "=?" charset "?" encoding "?" encoded-text "?=". So in your case, you have a base-64 encoded UTF-8 string.

You can use the email.header.decode_header and str.decode functions to decode it and get a proper Unicode string:

>>> import email.header
>>> x = email.header.decode_header('=?utf-8?b?WW91IGdvdCBhIGxldHRlciBmcm9tIERhxJdyaXVzIMSZxJfEr8SZxJfEr8SZ?=')
>>> x
[('You got a letter from Da\xc4\x97rius \xc4\x99\xc4\x97\xc4\xaf\xc4\x99\xc4\x97\xc4\xaf\xc4\x99', 'utf-8')]
>>> x[0][0].decode(x[0][1])
u'You got a letter from Da\u0117rius \u0119\u0117\u012f\u0119\u0117\u012f\u0119'

3👍

You should look at the email.header module in the Python standard library. In particular, at the end of the documentation, there’s a decode_header() function you can use to do most of the hard work for you.

0👍

the subject line is utf8 but you’re reading it as ASCII, you’re safest reading it all as utf8, as ASCII is effectively only as subset of utf8.

Leave a comment