165👍
Unicode is not equal to UTF-8. The latter is just an encoding for the former.
You are doing it the wrong way around. You are reading UTF-8-encoded data, so you have to decode the UTF-8-encoded String into a unicode string.
So just replace .encode
with .decode
, and it should work (if your .csv is UTF-8-encoded).
Nothing to be ashamed of, though. I bet 3 in 5 programmers had trouble at first understanding this, if not more 😉
Update:
If your input data is not UTF-8 encoded, then you have to .decode()
with the appropriate encoding, of course. If nothing is given, python assumes ASCII, which obviously fails on non-ASCII-characters.
99👍
Just add this lines to your code:
1.Python2
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
2.Python3
import sys
from importlib import reload
reload(sys)
sys.setdefaultencoding('utf-8')
- [Django]-Http POST drops port in URL
- [Django]-How can I list urlpatterns (endpoints) on Django?
- [Django]-How to access request body when using Django Rest Framework and avoid getting RawPostDataException
45👍
for Python 3 users. you can do
with open(csv_name_here, 'r', encoding="utf-8") as f:
#some codes
it works with flask too 🙂
- [Django]-How to produce a 303 Http Response in Django?
- [Django]-Django: Get list of model fields?
- [Django]-Django JSONField inside ArrayField
10👍
The main reason for the error is that the default encoding assumed by python is ASCII.
Hence, if the string data to be encoded by encode('utf8')
contains character that is outside of ASCII range e.g. for a string like ‘hgvcj터파크387’, python would throw error because the string is not in the expected encoding format.
If you are using python version earlier than version 3.5, a reliable fix would be to set the default encoding assumed by python to utf8
:
import sys
reload(sys)
sys.setdefaultencoding('utf8')
name = school_name.encode('utf8')
This way python would be able to anticipate characters within a string that fall outside of ASCII range.
However, if you are using python version 3.5 or above, reload() function is not available, so you would have to fix it using decode e.g.
name = school_name.decode('utf8').encode('utf8')
- [Django]-Login Page by using django forms
- [Django]-Embedding JSON objects in script tags
- [Django]-Specifying limit and offset in Django QuerySet wont work
5👍
Check which locale you’re using with the locale
command. If it’s not en_US.UTF-8
, change it like this:
sudo apt install locales
sudo locale-gen en_US en_US.UTF-8
sudo dpkg-reconfigure locales
If you don’t have permission to do that you can run all your Python code like this:
PYTHONIOENCODING="UTF-8" python3 ./path/to/your/script.py
or run this command before running your Python code
export PYTHONIOENCODING="UTF-8"
to set it in the shell you run that in.
In my case, I was using POSIX
, the default Ubuntu locale instead of en_US.UTF-8
, so I saw this output:
$ locale
LANG=
LANGUAGE=
LC_CTYPE="POSIX"
LC_NUMERIC="POSIX"
LC_TIME="POSIX"
LC_COLLATE="POSIX"
LC_MONETARY="POSIX"
LC_MESSAGES="POSIX"
LC_PAPER="POSIX"
LC_NAME="POSIX"
LC_ADDRESS="POSIX"
LC_TELEPHONE="POSIX"
LC_MEASUREMENT="POSIX"
LC_IDENTIFICATION="POSIX"
LC_ALL=
which caused Python to open files as ASCII instead of UTF-8.
You can check which locale Python is using like this:
>>> import locale
>>> locale.getpreferredencoding(False)
'ANSI_X3.4-1968'
locale.getpreferredencoding(False)
is the function called by open()
when you don’t provide an encoding. The output should be 'UTF-8'
, but in my case it was 'ANSI_X3.4-1968'
, some variant of ASCII.
- [Django]-Django Admin app or roll my own?
- [Django]-Group by Foreign Key and show related items – Django
- [Django]-Django-DB-Migrations: cannot ALTER TABLE because it has pending trigger events
4👍
For Python 3 users:
changing the encoding from ‘ascii’ to ‘latin1’ works.
Also, you can try finding the encoding automatically by reading the top 10000 bytes using the below snippet:
import chardet
with open("dataset_path", 'rb') as rawdata:
result = chardet.detect(rawdata.read(10000))
print(result)
- [Django]-Dynamic choices field in Django Models
- [Django]-Django switching, for a block of code, switch the language so translations are done in one language
- [Django]-Querying django migrations table
1👍
if you get this issue while running certbot while creating or renewing certificate, Please use the following method
grep -r -P '[^\x00-\x7f]' /etc/apache2 /etc/letsencrypt /etc/nginx
That command found the offending character “´” in one .conf file in the comment. After removing it (you can edit comments as you wish) and reloading nginx, everything worked again.
- [Django]-Speeding up Django Testing
- [Django]-Add additional options to Django form select widget
- [Django]-TransactionManagementError "You can't execute queries until the end of the 'atomic' block" while using signals, but only during Unit Testing
1👍
Or when you deal with text in Python if it is a Unicode text, make a note it is Unicode.
Set text=u'unicode text'
instead just text='unicode text'
.
This worked in my case.
- [Django]-How to disable admin-style browsable interface of django-rest-framework?
- [Django]-Charts in django Web Applications
- [Django]-Django. A good tutorial for Class Based Views
1👍
Dealing with this issue inside of a Docker container.
It might be the case (as it was for me) that you only need to generate the locale and do nothing more:
sudo locale-gen en_US en_US.UTF-8
In some case that was sufficient for me because locales was already installed and configured. If you have to install locales and configure it, add the following part to your Dockerfile:
RUN apt update && apt install locales && \
sed -i -e 's/# en_US.UTF-8 UTF-8/en_US.UTF-8 UTF-8/' /etc/locale.gen && \
echo 'LANG="en_US.UTF-8"'>/etc/default/locale && \
dpkg-reconfigure --frontend=noninteractive locales && \
update-locale LANG=en_US.UTF-8
ENV LANG en_US.UTF-8
ENV LANGUAGE en_US.UTF-8
ENV LC_ALL en_US.UTF-8
I tested it like this:
cat <<EOF > /tmp/test.txt
++*=|@#|¼üöäàéàè!´]]¬|¢|¢¬|{ł|¼½{}}
EOF
python3
import pathlib; pathlib.Path("/tmp/test.txt").read_text()
- [Django]-Update only specific fields in a models.Model
- [Django]-Split models.py into several files
- [Django]-Automatic creation date for Django model form objects
0👍
I faced this issue while using Pickle for unloading.
Try,
data = pickle.load(f,encoding='latin1')
- [Django]-Django migration fails with "__fake__.DoesNotExist: Permission matching query does not exist."
- [Django]-Django: Get model from string?
- [Django]-Django rest framework serializing many to many field
0👍
If you encounter this problem in a docker container. Maybe you need to configure locale.
apt-get install locales
dpkg-reconfigure locales # select 146(en_US.UTF-8)
echo "export LC_ALL=en_US.UTF-8" >> ~/.bashrc
echo "export LANG=en_US.UTF-8" >> ~/.bashrc
echo "export LANGUAGE=en_US.UTF-8" >> ~/.bashrc
. ~/.bashrc
- [Django]-Django – what is the difference between render(), render_to_response() and direct_to_template()?
- [Django]-DRF: custom ordering on related serializers
- [Django]-How to use 'select_related' with get_object_or_404?
-1👍
open with encoding UTF 16 because of lat and long.
with open(csv_name_here, 'r', encoding="utf-16") as f:
- [Django]-Changing a project name in django
- [Django]-What is a django.utils.functional.__proxy__ object and what it helps with?
- [Django]-How do I use pagination with Django class based generic ListViews?
- [Django]-Charts in django Web Applications
- [Django]-What's the difference between ContentType and MimeType?
- [Django]-Django: guidelines for speeding up template rendering performance