Skip to content Skip to sidebar Skip to footer

Python File Input String: How To Handle Escaped Unicode Characters?

In a text file (test.txt), my string looks like this: Gro\u00DFbritannien Reading it, python escapes the backslash: >>> file = open('test.txt', 'r') >>> input =

Solution 1:

You want to use the unicode_escape codec:

>>>x = 'Gro\\u00DFbritannien'>>>y = unicode(x, 'unicode_escape')>>>print y
Großbritannien

See the docs for the vast number of standard encodings that come as part of the Python standard library.

Solution 2:

Use the built-in 'unicode_escape' codec:

>>> file = open('test.txt', 'r')
>>> input = file.readline()
>>> input'Gro\\u00DFbritannien\n'>>> input.decode('unicode_escape')
u'Gro\xdfbritannien\n'

You may also use codecs.open():

>>> import codecs
>>> file = codecs.open('test.txt', 'r', 'unicode_escape')
>>> input = file.readline()
>>> inputu'Gro\xdfbritannien\n'

The list of standard encodings is available in the Python documentation: http://docs.python.org/library/codecs.html#standard-encodings

Post a Comment for "Python File Input String: How To Handle Escaped Unicode Characters?"