2đ
The big problem here is that youâre mixing up Python 2 and Python 3. In particular, youâve written Python 3 code, and youâre trying to run it in Python 2.7. But there are a few other problems along the way. So, let me try to explain everything thatâs going wrong.
I started compiling an SSCCE, and quickly found that the problem is only there if I try to print the value in a tuple. In other words,
print(lines[0].strip())
works fine, butprint(lines[0].strip(), lines[1].strip())
does not.
The first problem here is that the str
of a tuple (or any other collection) includes the repr
, not the str
, of its elements. The simple way to solve this problem is to not print collections. In this case, there is really no reason to print a tuple at all; the only reason you have one is that youâve built it for printing. Just do something like this:
print '({}, {})'.format(lines[0].strip(), lines[1].strip())
In cases where you already have a collection in a variable, and you want to print out the str of each element, you have to do that explicitly. You can print the repr of the str of each with this:
print tuple(map(str, my_tuple))
⊠or print the str of each directly with this:
print '({})'.format(', '.join(map(str, my_tuple)))
Notice that Iâm using Python 2 syntax above. Thatâs because if you actually used Python 3, there would be no tuple in the first place, and there would also be no need to call str
.
Youâve got a Unicode string. In Python 3, unicode
and str
are the same type. But in Python 2, itâs bytes
and str
that are the same type, and unicode
is a different one. So, in 2.x, you donât have a str
yet, which is why you need to call str
.
And Python 2 is also why print(lines[0].strip(), lines[1].strip())
prints a tuple. In Python 3, thatâs a call to the print
function with two strings as arguments, so it will print out two strings separated by a space. In Python 2, itâs a print
statement with one argument, which is a tuple.
If you want to write code that works the same in both 2.x and 3.x, you either need to avoid ever printing more than one argument, or use a wrapper like six.print_
, or do a from __future__ import print_function
, or be very careful to do ugly things like adding in extra parentheses to make sure your tuples are tuples in both versions.
So, in 3.x, youâve got str
objects and you just print them out. In 2.x, youâve got unicode
objects, and youâre printing out their repr
. You can change that to print out their str
, or to avoid printing a tuple in the first place⊠but that still wonât help anything.
Why? Well, printing anything, in either version, just calls str
on it and then passes it to sys.stdio.write
. But in 3.x, str
means unicode
, and sys.stdio
is a TextIOWrapper
; in 2.x, str
means bytes
, and sys.stdio
is a binary file
.
So, the pseudocode for what ultimately happens is:
sys.stdio.wrapped_binary_file.write(s.encode(sys.stdio.encoding, sys.stdio.errors))
sys.stdio.write(s.encode(sys.getdefaultencoding()))
And, as you saw, those will do different things, because:
print(sys.getdefaultencoding(), sys.stdout.encoding, f.encoding)
yields('ascii', 'UTF-8', None)
You can simulate Python 3 here by using a io.TextIOWrapper
or codecs.StreamWriter
and then using print >>f, âŠ
or f.write(âŠ)
instead of print
, or you can explicitly encode all your unicode
objects like this:
print '({})'.format(', '.join(element.encode('utf-8') for element in my_tuple)))
But really, the best way to deal with all of these problems is to run your existing Python 3 code in a Python 3 interpreter instead of a Python 2 interpreter.
If you want or need to use Python 2.7, thatâs fine, but you have to write Python 2 code. If you want to write Python 3 code, thatâs great, but you have to run Python 3.3. If you really want to write code that works properly in both, you can, but itâs extra work, and takes a lot more knowledge.
For further details, see Whatâs New In Python 3.0 (the âPrint Is A Functionâ and âText Vs. Data Instead Of Unicode Vs. 8-bitâ sections), although thatâs written from the point of view of explaining 3.x to 2.x users, which is backward from what you need. The 3.x and 2.x versions of the Unicode HOWTO may also help.
0đ
For completeness: Iâm reading from the files with lines = file.readlines() and printing with the standard print() function. No manual encoding or decoding happens at either end.
In Python 3.x, the standard print
function just writes Unicode to sys.stdout
. Since thatâs a io.TextIOWrapper
, its write
method is equivalent to this:
self.wrapped_binary_file.write(s.encode(self.encoding, self.errors))
So one likely problem is that sys.stdout.encoding
does not match your terminalâs actual encoding.
And of course another is that your shellâs encoding does not match your terminal windowâs encoding.
For example, on OS X, I create a myscript.py like this:
print('\u00e5')
Then I fire up Terminal.app, create a session profile with encoding âWestern (ISO Latin 1)â, create a tab with that session profile, and do this:
$ export LANG=en_US.UTF-8
$ python3 myscript.py
⊠and I get exactly the behavior youâre seeing.
- [Answered ]-Integrating a custom 'change-list' page in a Django project
- [Answered ]-Are django models thread safe?
- [Answered ]-Override page.html template from an app
0đ
It seems from your comment that you are using python-2 and not python-3.
If you are using python-3, itâs worth reading the unicode howto guide on reading/writing to understand what python is doing.
The basic flow if encoding is:
DECODE from encoding to unicode -> Processing -> Encode from unicode to encoding
In python3 the bytes are decoded to strings and strings are encoded to bytes.
The bytes to string decoding is handled for you with open()
.
[..] the built-in open() function can return a file-like object that
assumes the fileâs contents are in a specified encoding and accepts
Unicode parameters for methods such as read() and write(). This works
through open()âs encoding and errors parameters [..]
So to read in unicode from a utf-8 encoded file you should be doing this:
# python-3
with open('utf8.txt', mode='r', encoding='utf-8') as f:
lines = f.readlines() # returns unicode
If you want similar functionality using python-2, you can use codecs.open()
:
# python-2
import codecs
with codecs.open('utf8.txt', mode='r', encoding='utf-8') as f:
lines = f.readlines() # returns unicode
- [Answered ]-Django/South "python manage.py migrate CaseReport" raises an exception
- [Answered ]-Django â How to link tables
- [Answered ]-The "select all" header checkbox that I used in Django 1.5 is missing in Django 1.9