Unicode Problems With Web Pages In Python's Urllib
Solution 1:
As noted by Lennart, your problem is not the decoding. It is trying to encode into "ascii", which is often a problem with print
statements. I suspect the line
printstr
is your problem. You need to encode the str into whatever your console is using to have that line work.
Solution 2:
It doesn't look like Python is "reading it in UTF-8" at all. As already pointed out, you have an encoding problem, NOT a decoding problem. It is impossible for that error to have arisen from that line that you say. When asking a question like this, always give the full traceback and error message.
Kathy's suspicion is correct; in fact the print str
line is the only possible source of that error, and that can only happen when sys.stdout.encoding is not set so Python punts on 'ascii'.
Variables that may affect the outcome are what version of Python you are using, what platform you are running on and exactly how you run your script -- none of which you have told us; please do.
Example: I'm using Python 2.6.2 on Windows XP and I'm running your script with some diagnostic additions:
(1) import sys; print sys.stdout.encoding
up near the front
(2) print repr(str)
before print str
so that I can see what you've got before it crashes.
In a Command Prompt window, if I do \python26\python hockey.py
it prints cp850
as the encoding and just works.
However if I do
\python26\python hockey.py | more
or
\python26\python hockey.py >hockey.txt
it prints None
as the encoding and crashes with your error message on the first line with the a-with-diaeresis:
C:\junk>\python26\python hockey.py >hockey.txt
Traceback (most recent call last):
File "hockey.py", line 18, in <module>
print str
UnicodeEncodeError:'ascii' codec can't encode character u'\xe4' in position 2: ordinal not in range(128)
If that fits your case, the fix in general is to explicitly encode your output with an encoding suited to the display mechanism you plan to use.
Solution 3:
That text is indeed iso-88591-1, and I can decode it without a problem, and indeed your code runs without a hitch.
Your error, however, is an ENCODE error, not a decode error. And you don't do any encoding in your code, so. Possibly you have gotten encoding and decoding confused, it's a common problem.
You DECODE from Latin1 to Unicode. You ENCODE the other way. Remember that Latin1, UTF8 etc are called "encodings".
Post a Comment for "Unicode Problems With Web Pages In Python's Urllib"