Python File Input String: How To Handle Escaped Unicode Characters?
In a text file (test.txt), my string looks like this: Gro\u00DFbritannien Reading it, python escapes the backslash: >>> file = open('test.txt', 'r') >>> input =
Solution 1:
You want to use the unicode_escape
codec:
>>>x = 'Gro\\u00DFbritannien'>>>y = unicode(x, 'unicode_escape')>>>print y
Großbritannien
See the docs for the vast number of standard encodings that come as part of the Python standard library.
Solution 2:
Use the built-in 'unicode_escape' codec:
>>> file = open('test.txt', 'r')
>>> input = file.readline()
>>> input'Gro\\u00DFbritannien\n'>>> input.decode('unicode_escape')
u'Gro\xdfbritannien\n'
You may also use codecs.open()
:
>>> import codecs
>>> file = codecs.open('test.txt', 'r', 'unicode_escape')
>>> input = file.readline()
>>> inputu'Gro\xdfbritannien\n'
The list of standard encodings is available in the Python documentation: http://docs.python.org/library/codecs.html#standard-encodings
Post a Comment for "Python File Input String: How To Handle Escaped Unicode Characters?"