1

I have a text file. It consists of many non-english character. I want to store this file as a number sequences such as ascii.

How can I represent a non-english character?

>>> str(ord('x'))
'120'
>>> str(ord('ç'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: ord() expected a character, but string of length 2 found
>>> 
unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677

1 Answers1

1

You will have to first decode it with the proper encoding scheme, them after that you will get the ordinal value of that character, since ord return the integer value of one-character string:

>>> s = 'ç'
>>> s
'\xc3\xa7'
>>> print s
ç
>>> len(s)
2
>>> s.decode('utf-8')
u'\xe7'
>>> len(s.decode('utf-8'))
1
>>> ord(s.decode('utf-8'))
231
Iron Fist
  • 10,739
  • 2
  • 18
  • 34