Skip to content Skip to sidebar Skip to footer

Conversion Of Unicode

I am a newbie in python. I have a unicode in Tamil. When I use the sys.getdefaultencoding() I get the output as 'Cp1252' My requirement is that when I use text = testString.decode(

Solution 1:

When I use the sys.getdefaultencoding() I get the output as "Cp1252"

Two comments on that: (1) it's "cp1252", not "Cp1252". Don't type from memory. (2) Whoever caused sys.getdefaultencoding() to produce "cp1252" should be told politely that that's not a very good idea.

As for the rest, let me guess. You have a unicode object that contains some text in the Tamil language. You try, erroneously, to decode it. Decode means to convert from a str object to a unicode object. Unfortunately you don't have a str object, and even more unfortunately you get bounced by one of the very few awkish/perlish warts in Python 2: it tries to make a str object by encoding your unicode string using the system default encoding. If that's 'ascii' or 'cp1252', encoding will fail. That's why you get a Unicode*En*codeError instead of a Unicode*De*codeError.

Short answer: do text = testString.encode("utf-8"), if that's what you really want to do. Otherwise please explain what you want to do, and show us the result of print repr(testString).

Solution 2:

add this as your 1st line of code

# -*- coding: utf-8 -*- 

later in your code...

text = unicode(testString,"UTF-8")

Solution 3:

you need to know which character-encoding is testString using. if not utf8, an error will occur when using decode('utf8').

Post a Comment for "Conversion Of Unicode"