[Django]-Parsing unicode input using python json.loads

12👍

I typecasting the string into unicode string using ‘latin-1’ fixed the error:

UnicodeDecodeError: 'utf16' codec can't decode byte 0x38 in 
position 6: truncated data

Fixed code:

import json

ustr_to_load = unicode(str_to_load, 'latin-1')

json.loads(ustr_to_load)

And then the error is not thrown.

6👍

The OP clarifies (in a comment!)…:

Source data is huge unicode encoded
string

Then you have to know which of the many unicode encodings it uses — clearly not ‘utf-16’, since that failed, but there are so many others — ‘utf-8’, ‘iso-8859-15’, and so forth. You either try them all until one works, or print repr(str_to_load[:80]) and paste what it shows as an edit of your question, so we can guess on your behalf!-).

6👍

The simplest way I have found is

import simplejson as json

that way your code remains the same

json.loads(str_to_load)

reference: https://simplejson.readthedocs.org/en/latest/

1👍

With django you can use SimpleJSON and use loads instead of just load.

from django.utils import simplejson

simplejson.loads(str_to_load, "utf-8")

Leave a comment