29

I have a function like this:

persian_numbers = '۱۲۳۴۵۶۷۸۹۰'
english_numbers = '1234567890'
arabic_numbers  = '١٢٣٤٥٦٧٨٩٠'

english_trans   = string.maketrans(english_numbers, persian_numbers)
arabic_trans    = string.maketrans(arabic_numbers, persian_numbers)

text.translate(english_trans)
text.translate(arabic_trans)

I want it to translate all Arabic and English numbers to Persian. But Python says:

english_translate = string.maketrans(english_numbers, persian_numbers)
ValueError: maketrans arguments must have same length

I tried to encode strings with Unicode utf-8 but I always got some errors! Sometimes the problem is Arabic string instead! Do you know a better solution for this job?

EDIT:

It seems the problem is Unicode characters length in ASCII. An Arabic number like '۱' is two character -- that I find out with ord(). And the length problem starts from here :-(

Community
  • 1
  • 1
Shahin
  • 1,415
  • 4
  • 22
  • 33
  • what you want? change english char to persian? why dont create custom function? – Mohammad Efazati Aug 09 '12 at 08:29
  • Did you mean SMTH like regular expression substitute? If I can't find solution with this way I have to use SMTH like that! Actually I saw that translate function is work fine in ruby! – Shahin Aug 09 '12 at 08:35
  • 1
    Please, don't do guess work when dealing with international character sets - check the article I've linked on my answer - even if you will prefer to work in ruby. – jsbueno Aug 09 '12 at 12:48

8 Answers8

37

See unidecode library which converts all strings into UTF8. It is very useful in case of number input in different languages.

In Python 2:

>>> from unidecode import unidecode
>>> a = unidecode(u"۰۱۲۳۴۵۶۷۸۹")
>>> a
'0123456789'
>>> unidecode(a)
'0123456789'

In Python 3:

>>> from unidecode import unidecode
>>> a = unidecode("۰۱۲۳۴۵۶۷۸۹")
>>> a
'0123456789'
>>> unidecode(a)
'0123456789'
F.Tamy
  • 594
  • 6
  • 17
  • You can convert all kind of numbers into English numbers as I mentioned before then try to use maketrans and translate built-in functions to convert English numbers into Persian numbers (or any other language numbers). – F.Tamy Oct 01 '18 at 15:28
13

Unicode objects can interpret these digits (arabic and persian) as actual digits - no need to translate them by using character substitution.

EDIT - I came out with a way to make your replacement using Python2 regular expressions:

# coding: utf-8

import re

# Attention: while the characters for the strings bellow are 
# dislplayed indentically, inside they are represented
# by distinct unicode codepoints

persian_numbers = u'۱۲۳۴۵۶۷۸۹۰'
arabic_numbers  = u'١٢٣٤٥٦٧٨٩٠'
english_numbers = u'1234567890'


persian_regexp = u"(%s)" %  u"|".join(persian_numbers)
arabic_regexp = u"(%s)" % u"|".join(arabic_numbers)

def _sub(match_object, digits):
    return english_numbers[digits.find(match_object.group(0))]

def _sub_arabic(match_object):
    return _sub(match_object, arabic_numbers)

def _sub_persian(match_object):
    return _sub(match_object, persian_numbers)


def replace_arabic(text):
    return re.sub(arabic_regexp, _sub_arabic, text)

def replace_persian(text):
    return re.sub(arabic_regexp, _sub_persian, text)

Attempt that the "text" parameter must be unicode itself.

(also this code could be shortened by using lambdas and combining some expressions in a single line, but there is no point in doing so, but for loosing readability)

It should work to you up to here, but please read on the original answer I had posted

-- original answer

So, if you instantiate your variables as unicode (prepending an u to the quote char), they are correctly understood in Python:

>>> persian_numbers = u'۱۲۳۴۵۶۷۸۹۰'
>>> english_numbers = u'1234567890'
>>> arabic_numbers  = u'١٢٣٤٥٦٧٨٩٠'
>>> 
>>> print int(persian_numbers)
1234567890
>>> print int(english_numbers)
1234567890
>>> print int(arabic_numbers)
1234567890
>>> persian_numbers.isdigit()
True
>>> 

By the way, the "maketrans" method does not exist for unicode objects (in Python2 - see the comments).

It is very important to understand the basics about unicode - for everyone, even people writing English only programs who think they will never deal with any char out of the 26 latin letters. When writing code that will deal with different chars it is vital - the program can't possibly work without you knowing what you are doing except by chance.

A very good article to read is http://www.joelonsoftware.com/articles/Unicode.html - please read it now. You can keep in mind, while reading it, that Python allows one to translate unicode characters to a string in any "physical" encoding by using the "encode" method of unicode objects.

>>> arabic_numbers  = u'١٢٣٤٥٦٧٨٩٠'
>>> len(arabic_numbers)
10
>>> enc_arabic = arabic_numbers.encode("utf-8")
>>> print enc_arabic
١٢٣٤٥٦٧٨٩٠
>>> len(enc_arabic)
20
>>> int(enc_arabic)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: '\xd9\xa1\xd9\xa2\xd9\xa3\xd9\xa4\xd9\xa5\xd9\xa6\xd9\xa7\xd9\xa8\xd9\xa9\xd9\xa0'

Thus, the characters loose their sense as "single entities" and as digits when encoding - the encoded object (str type in Python 2.x) is justa strrng of bytes - which nonetheless is needed when sending these characters to any output from the program - be it console, GUI Window, database, html code, etc...

jsbueno
  • 99,910
  • 10
  • 151
  • 209
  • Thank you for your incredibly complete answer – Shahin Aug 10 '12 at 06:04
  • I'm sorry @jsbueno but I can't understand the `match_object` in your example code! it's a compiled regular expression but how did you make it or pass it to your functions? – Shahin Aug 10 '12 at 08:34
  • 1
    the `_sub_persian` and `_sub_arabic` functions are used as callbacks by the regexp engine: they are called by the regular expression engine whenever a match for the regular expression is found. The match_object is passed to these functions. Check the docs at http://docs.python.org/library/re.html – jsbueno Aug 10 '12 at 14:32
  • BTW, I stated above that "unicode does not have maketrans" - that is true for Python2 - Python3 strings are unicode by default, and do feature a "maketrans" method that would work without the need fro the regexps gymnastics – jsbueno May 27 '15 at 14:30
  • Object `arabic_regexp` in `replace_persian` function should be replaced with `persian_regexp` – Mbt925 Mar 18 '17 at 10:14
11

You can use persiantools package:

Examples:

>>> from persiantools import digits

>>> digits.en_to_fa("0987654321")
'۰۹۸۷۶۵۴۳۲۱'

>>> digits.ar_to_fa("٠٩٨٧٦٥٤٣٢١")   # or digits.ar_to_fa(u"٠٩٨٧٦٥٤٣٢١")
'۰۹۸۷۶۵۴۳۲۱'
Majid
  • 1,673
  • 18
  • 27
10

unidecode converts all characters from Persian to English, If you want to change only numbers follow bellow:

In python3 you can use this code to convert any Persian|Arabic number to English number while keeping other characters unchanged:

intab='۱۲۳۴۵۶۷۸۹۰١٢٣٤٥٦٧٨٩٠'
outtab='12345678901234567890'
translation_table = str.maketrans(intab, outtab)
output_text = input_text.translate(translation_table)
Navid Naderi
  • 340
  • 3
  • 5
2

Use Unicode Strings:

persian_numbers = u'۱۲۳۴۵۶۷۸۹۰'
english_numbers = u'1234567890'
arabic_numbers  = u'١٢٣٤٥٦٧٨٩٠'

And make sure the encoding of your Python file is correct.

  • I tried it before. this time the problem is: arabic_translate = string.maketrans(arabic_numbers, persian_numbers) UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-9: ordinal not in range(128) – Shahin Aug 09 '12 at 08:02
  • 1
    It seems that [string.translate](http://docs.python.org/library/string.html?highlight=maketrans#string.translate) only handles ASCII. –  Aug 09 '12 at 08:06
  • 1
    Then, is it possible for me to make a string of ascii characters instead of numbers? – Shahin Aug 09 '12 at 08:10
  • 1
    As Tichodroma puts it: string.translate can't handle unicode objects. NotTrue that they only work with ASCII, but it will only work in a string encoding that does 1 character per byte. I don't think there is such an encoding that could represent both sets of digits (arabic, persian). – jsbueno Aug 09 '12 at 12:41
0

With this you can easily do that:

def p2e(persiannumber):
    
    number={
        '0':'۰',
        '1':'۱',
        '2':'۲',
        '3':'۳',
        '4':'۴',
        '5':'۵',
        '6':'۶',
        '7':'۷',
        '8':'۸',
        '9':'۹',
   }

    for i,j in number.items():
        persiannumber=persiannumber.replace(j,i)
        
    return persiannumber

here is usage:

print(p2e('۳۱۹۶'))
#returns 3196
team meryb
  • 65
  • 3
0

In Python 3 easiest way is:

str(int('۱۲۳'))
#123

but if number starts with 0 it have an issue.

so we can use zip() function:

for i, j in zip('1234567890', '۱۲۳۴۵۶۷۸۹۰'):
    number.replace(i, j)
-1
def persian_number(persiannumber):
    
    number={
        '0':'۰',
        '1':'۱',
        '2':'۲',
        '3':'۳',
        '4':'۴',
        '5':'۵',
        '6':'۶',
        '7':'۷',
        '8':'۸',
        '9':'۹',
   }

    for i,j in number.items():
        persiannumber=time2str.replace(i,j)
        
    return time2str

persiannumber must be a string