[Answer]-Getting error in python with characters from word document

1👍

csv module does not support unicode use https://github.com/jdunck/python-unicodecsv instead

although im not sure \u2018 is part of the utf-8 charset

x = "\\u2018f\\u2019fdsfs..."; j = json.loads('"' + x + '"'); print j.encode('cp1252')
‘f’fdsfs...

note that it is being encoded as cp1252

>>> import unicodecsv as csv #https://github.com/jdunck/python-unicodecsv
>>> x = "\\u2018f\\u2019fdsfs..."; j = json.loads('"' + x + '"');
>>> with open("some_file.csv","wb") as f:
...      w = csv.writer(f,encoding="cp1252")
...      w.writerow([j,"normal"])
...
>>>

here is the csv file : https://www.dropbox.com/s/m4gta1o9vg8tfap/some_file.csv

Leave a comment