String.maketrans for English and Persian numbers

Question

I have a function like this:

persian_numbers = '۱۲۳۴۵۶۷۸۹۰'
english_numbers = '1234567890'
arabic_numbers  = '١٢٣٤٥٦٧٨٩٠'

english_trans   = string.maketrans(english_numbers, persian_numbers)
arabic_trans    = string.maketrans(arabic_numbers, persian_numbers)

text.translate(english_trans)
text.translate(arabic_trans)

I want it to translate all Arabic and English numbers to Persian. But Python says:

english_translate = string.maketrans(english_numbers, persian_numbers)
ValueError: maketrans arguments must have same length

I tried to encode strings with Unicode utf-8 but I always got some errors! Sometimes the problem is Arabic string instead! Do you know a better solution for this job?

EDIT:

It seems the problem is Unicode characters length in ASCII. An Arabic number like '۱' is two character -- that I find out with ord(). And the length problem starts from here :-(

what you want? change english char to persian? why dont create custom function? — Mohammad Efazati, Aug 09 '12 at 08:29
Did you mean SMTH like regular expression substitute? If I can't find solution with this way I have to use SMTH like that! Actually I saw that translate function is work fine in ruby! — Shahin, Aug 09 '12 at 08:35
Please, don't do guess work when dealing with international character sets - check the article I've linked on my answer - even if you will prefer to work in ruby. — jsbueno, Aug 09 '12 at 12:48

score 37 · Accepted Answer · answered Apr 25 '18 at 13:47

37

See unidecode library which converts all strings into UTF8. It is very useful in case of number input in different languages.

In Python 2:

>>> from unidecode import unidecode
>>> a = unidecode(u"۰۱۲۳۴۵۶۷۸۹")
>>> a
'0123456789'
>>> unidecode(a)
'0123456789'

In Python 3:

>>> from unidecode import unidecode
>>> a = unidecode("۰۱۲۳۴۵۶۷۸۹")
>>> a
'0123456789'
>>> unidecode(a)
'0123456789'

answered Apr 25 '18 at 13:47

F.Tamy

594
6
17

You can convert all kind of numbers into English numbers as I mentioned before then try to use maketrans and translate built-in functions to convert English numbers into Persian numbers (or any other language numbers). – F.Tamy Oct 01 '18 at 15:28

jsbueno · Answer 2 · 2016-07-17T02:58:40.580

Unicode objects can interpret these digits (arabic and persian) as actual digits - no need to translate them by using character substitution.

EDIT - I came out with a way to make your replacement using Python2 regular expressions:

# coding: utf-8

import re

# Attention: while the characters for the strings bellow are 
# dislplayed indentically, inside they are represented
# by distinct unicode codepoints

persian_numbers = u'۱۲۳۴۵۶۷۸۹۰'
arabic_numbers  = u'١٢٣٤٥٦٧٨٩٠'
english_numbers = u'1234567890'


persian_regexp = u"(%s)" %  u"|".join(persian_numbers)
arabic_regexp = u"(%s)" % u"|".join(arabic_numbers)

def _sub(match_object, digits):
    return english_numbers[digits.find(match_object.group(0))]

def _sub_arabic(match_object):
    return _sub(match_object, arabic_numbers)

def _sub_persian(match_object):
    return _sub(match_object, persian_numbers)


def replace_arabic(text):
    return re.sub(arabic_regexp, _sub_arabic, text)

def replace_persian(text):
    return re.sub(arabic_regexp, _sub_persian, text)

Attempt that the "text" parameter must be unicode itself.

(also this code could be shortened by using lambdas and combining some expressions in a single line, but there is no point in doing so, but for loosing readability)

It should work to you up to here, but please read on the original answer I had posted

-- original answer

So, if you instantiate your variables as unicode (prepending an u to the quote char), they are correctly understood in Python:

>>> persian_numbers = u'۱۲۳۴۵۶۷۸۹۰'
>>> english_numbers = u'1234567890'
>>> arabic_numbers  = u'١٢٣٤٥٦٧٨٩٠'
>>> 
>>> print int(persian_numbers)
1234567890
>>> print int(english_numbers)
1234567890
>>> print int(arabic_numbers)
1234567890
>>> persian_numbers.isdigit()
True
>>>

By the way, the "maketrans" method does not exist for unicode objects (in Python2 - see the comments).

It is very important to understand the basics about unicode - for everyone, even people writing English only programs who think they will never deal with any char out of the 26 latin letters. When writing code that will deal with different chars it is vital - the program can't possibly work without you knowing what you are doing except by chance.

A very good article to read is http://www.joelonsoftware.com/articles/Unicode.html - please read it now. You can keep in mind, while reading it, that Python allows one to translate unicode characters to a string in any "physical" encoding by using the "encode" method of unicode objects.

>>> arabic_numbers  = u'١٢٣٤٥٦٧٨٩٠'
>>> len(arabic_numbers)
10
>>> enc_arabic = arabic_numbers.encode("utf-8")
>>> print enc_arabic
١٢٣٤٥٦٧٨٩٠
>>> len(enc_arabic)
20
>>> int(enc_arabic)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: '\xd9\xa1\xd9\xa2\xd9\xa3\xd9\xa4\xd9\xa5\xd9\xa6\xd9\xa7\xd9\xa8\xd9\xa9\xd9\xa0'

Thus, the characters loose their sense as "single entities" and as digits when encoding - the encoded object (str type in Python 2.x) is justa strrng of bytes - which nonetheless is needed when sending these characters to any output from the program - be it console, GUI Window, database, html code, etc...

I'm sorry @jsbueno but I can't understand the `match_object` in your example code! it's a compiled regular expression but how did you make it or pass it to your functions? — Shahin, Aug 10 '12 at 08:34
the `_sub_persian` and `_sub_arabic` functions are used as callbacks by the regexp engine: they are called by the regular expression engine whenever a match for the regular expression is found. The match_object is passed to these functions. Check the docs at http://docs.python.org/library/re.html — jsbueno, Aug 10 '12 at 14:32
BTW, I stated above that "unicode does not have maketrans" - that is true for Python2 - Python3 strings are unicode by default, and do feature a "maketrans" method that would work without the need fro the regexps gymnastics — jsbueno, May 27 '15 at 14:30
Object `arabic_regexp` in `replace_persian` function should be replaced with `persian_regexp` — Mbt925, Mar 18 '17 at 10:14

Majid · Answer 3 · 2020-07-27T11:41:35.227

11

You can use persiantools package:

Examples:

>>> from persiantools import digits

>>> digits.en_to_fa("0987654321")
'۰۹۸۷۶۵۴۳۲۱'

>>> digits.ar_to_fa("٠٩٨٧٦٥٤٣٢١")   # or digits.ar_to_fa(u"٠٩٨٧٦٥٤٣٢١")
'۰۹۸۷۶۵۴۳۲۱'

edited Jul 27 '20 at 11:41

answered Jun 12 '19 at 20:38

Majid

1,673
18
27

this is very useful! – Daniel Jun 23 '20 at 20:15
Great. Capable of English to Persian too. – SuB May 05 '22 at 08:34
Great work! What if the number is mix of Arabic or Persian or the keyboard is unknown (either Arabic or Persian). – Ahmad Jun 14 '22 at 09:19

score 10 · Answer 4 · answered Mar 12 '20 at 08:05

10

unidecode converts all characters from Persian to English, If you want to change only numbers follow bellow:

In python3 you can use this code to convert any Persian|Arabic number to English number while keeping other characters unchanged:

intab='۱۲۳۴۵۶۷۸۹۰١٢٣٤٥٦٧٨٩٠'
outtab='12345678901234567890'
translation_table = str.maketrans(intab, outtab)
output_text = input_text.translate(translation_table)

answered Mar 12 '20 at 08:05

Navid Naderi

340
3
5

1

no need libs and other headaches. It works! thanks bro! – Solivan Nov 09 '22 at 08:27

score 2 · Answer 5 · answered Aug 09 '12 at 07:58

2

Use Unicode Strings:

persian_numbers = u'۱۲۳۴۵۶۷۸۹۰'
english_numbers = u'1234567890'
arabic_numbers  = u'١٢٣٤٥٦٧٨٩٠'

And make sure the encoding of your Python file is correct.

answered Aug 09 '12 at 07:58

I tried it before. this time the problem is: arabic_translate = string.maketrans(arabic_numbers, persian_numbers) UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-9: ordinal not in range(128) – Shahin Aug 09 '12 at 08:02
1

It seems that [string.translate](http://docs.python.org/library/string.html?highlight=maketrans#string.translate) only handles ASCII. – Aug 09 '12 at 08:06
1

Then, is it possible for me to make a string of ascii characters instead of numbers? – Shahin Aug 09 '12 at 08:10
1

As Tichodroma puts it: string.translate can't handle unicode objects. NotTrue that they only work with ASCII, but it will only work in a string encoding that does 1 character per byte. I don't think there is such an encoding that could represent both sets of digits (arabic, persian). – jsbueno Aug 09 '12 at 12:41

score 0 · Answer 6 · answered Apr 28 '22 at 09:02

With this you can easily do that:

def p2e(persiannumber):
    
    number={
        '0':'۰',
        '1':'۱',
        '2':'۲',
        '3':'۳',
        '4':'۴',
        '5':'۵',
        '6':'۶',
        '7':'۷',
        '8':'۸',
        '9':'۹',
   }

    for i,j in number.items():
        persiannumber=persiannumber.replace(j,i)
        
    return persiannumber

here is usage:

print(p2e('۳۱۹۶'))
#returns 3196

Mohammad Chenarani · Answer 7 · 2022-12-23T18:28:17.857

0

In Python 3 easiest way is:

str(int('۱۲۳'))
#123

but if number starts with 0 it have an issue.

so we can use zip() function:

for i, j in zip('1234567890', '۱۲۳۴۵۶۷۸۹۰'):
    number.replace(i, j)

edited Dec 23 '22 at 18:28

answered Dec 23 '22 at 16:37

Mohammad Chenarani

85
5

Hossein MEY · Answer 8 · 2021-08-26T03:32:40.060

-1

def persian_number(persiannumber):
    
    number={
        '0':'۰',
        '1':'۱',
        '2':'۲',
        '3':'۳',
        '4':'۴',
        '5':'۵',
        '6':'۶',
        '7':'۷',
        '8':'۸',
        '9':'۹',
   }

    for i,j in number.items():
        persiannumber=time2str.replace(i,j)
        
    return time2str

persiannumber must be a string

edited Aug 26 '21 at 03:32

answered Aug 25 '21 at 11:47

Hossein MEY

41
6

String.maketrans for English and Persian numbers

EDIT:

8 Answers8