1

Well, I have a validator for a website URL. It works perfectly, but I need add the ability to type non-English characterss like Arabic, etc. I know a little about regular expressions, but I don't know how I can allow Arabic characters here, for example, instead \w or \a.

$('#WebSiteTextbox').blur(function () {

    patternurl = /([\d\w]+?:\/\/)?([\w\d\.\-]+)(\.\w+)(:\d{1,5})?(\/\S*)?/i
    if (!patternurl.test($("#WebSiteTextbox").val())) {
        $(this).attr('value','');
        $('.ValidatorError').html('Not Valid').slideDown().delay(5000).promise().done(function () {
            $(this).slideUp();
        });
    }
});

JSFiddle

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Pedram
  • 15,766
  • 10
  • 44
  • 73
  • Possible duplicate of [Include Arabic characters in JavaScript regular expression?](http://stackoverflow.com/questions/12847333/include-arabic-characters-in-javascript-regular-expression) – Emissary Nov 14 '15 at 11:48
  • no I saw that before, it's not in my case @Emissary – Pedram Nov 14 '15 at 11:49
  • @jiff "*chars like Arabic and etc*" - What do you mean by `etc` specifically? – Mariano Nov 14 '15 at 11:49
  • @jiff it is your case... you can't use `\w` - the shorthands only encompass a small alphabet - you have to specify the unicode range instead. – Emissary Nov 14 '15 at 11:50
  • yes you right but that answer didn't help me out. I used that but didn't work in my case. please see: http://jsfiddle.net/sobkqfa6/1/ @Emissary – Pedram Nov 14 '15 at 11:56
  • @Mariano it was an example, you just consider arabic and persian – Pedram Nov 14 '15 at 11:56
  • @jiff please include in your question why the linked answer did not help you out. This way people won't close this as a duplicate. – Adriaan Nov 14 '15 at 15:46

1 Answers1

2

You need to include the specific character ranges for Arabic and Persian characters. \w can be expressed as [A-Za-z0-9_]. You can include any character range in that same character class.

From Arabic script in Unicode:

  1. Arabic (0600—06FF, 255 characters)
    1. Arabic-Indic Digits (0660-0669)
    2. Extended Arabic-Indic Digits (06F0-06F9)
  2. Arabic Supplement (0750—077F, 48 characters)
  3. Arabic Extended-A (08A0—08FF, 50 characters)
  4. Arabic Presentation Forms-A (FB50—FDFF, 611 characters)
  5. Arabic Presentation Forms-B (FE70—FEFF, 140 characters)
  6. Rumi Numeral Symbols (10E60—10E7F, 31 characters)
  7. Arabic Mathematical Alphabetic Symbols (1EE00—1EEFF, 143 characters)

The basic Arabic range encodes the standard letters and diacritics, but does not encode contextual forms (U+0621–U+0652 being directly based on ISO 8859-6); and also includes the most common diacritics and Arabic-Indic digits. The Arabic Supplement range encodes letter variants mostly used for writing African (non-Arabic) languages. The Arabic Extended-A range encodes additional Qur'anic annotations and letter variants used for various non-Arabic languages. The Arabic Presentation Forms-A range encodes contextual forms and ligatures of letter variants needed for Persian, Urdu, Sindhi and Central Asian languages. The Arabic Presentation Forms-B range encodes spacing forms of Arabic diacritics, and more contextual letter forms. The presentation forms are present only for compatibility with older standards, and are not currently needed for coding text. The Arabic Mathematical Alphabetical Symbols block encodes characters used in Arabic mathematical expressions.

I think you should include:

  • In \w: 1 and 3
  • In \d: 1.1

I believe this would include English, Arabic and Persian:

/(\w+:\/\/)?([-.a-z0-9_\u0600-\u06FF\u08A0-\u08FF]+)(\.\w+)(:\d{1,5})?(\/\S*)?/i
  • I am assuming you can't have Arabic characters in the protocol, the extension and the port number, only in the domain.
Mariano
  • 6,423
  • 4
  • 31
  • 47
  • @jiff I'm glad it did. I should have mentioned though I don't speak Arabic so I'd recommend checking for *special* characters such as accents or diacritics to verify they're included in that range. – Mariano Nov 14 '15 at 12:27
  • yeah, sure, have to check. Thanks a lot. – Pedram Nov 14 '15 at 12:36