19👍
Pickle data is opaque, binary data, even when you use protocol version 0:
>>> pickle.dumps(data, 0)
'(dp0\nI1\nV\xe9\np1\ns.'
When you try to store that in a TextField
, Django will try to decode that data to UTF8 to store it; this is what fails because this is not UTF-8 encoded data; it is binary data instead:
>>> pickled_data.decode('utf8')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/mj/Development/venvs/stackoverflow-2.7/lib/python2.7/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe9 in position 9: invalid continuation byte
The solution is to not try to store this in a TextField
. Use a BinaryField
instead:
A field to store raw binary data. It only supports
bytes
assignment. Be aware that this field has limited functionality. For example, it is not possible to filter a queryset on a BinaryField value.
You have a bytes
value (Python 2 strings are byte strings, renamed to bytes
in Python 3).
If you insist on storing the data in a text field, explicitly decode it as latin1
; the Latin 1 codec maps bytes one-on-one to Unicode codepoints:
>>> pickled_data.decode('latin1')
u'(dp0\nI1\nV\xe9\np1\ns.'
and make sure you encode it again before unpickling again:
>>> encoded = pickled_data.decode('latin1')
>>> pickle.loads(encoded)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/mj/Development/Libraries/buildout.python/parts/opt/lib/python2.7/pickle.py", line 1381, in loads
file = StringIO(str)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 9: ordinal not in range(128)
>>> pickle.loads(encoded.encode('latin1'))
{1: u'\xe9'}
Do note that if you let this value go to the browser and back again in a text field, the browser is likely to have replaced characters in that data. Internet Explorer will replace \n
characters with \r\n
, for example, because it assumes it is dealing with text.
Not that you ever should allow accepting pickle data from a network connection in any case, because that is a security hole waiting for exploitation.