5

I want to validate a text input field in a html page to accept only Cyrillic letters. I have written the validation code in JavaScript using regular expression like this:

var namevalue = document.getElementById("name")
var letters = /^[А-Яа-я]+$/;
if (namevalue.matches(letters)) {
  alert("Accepted");
}
else {
  alert("Enter only cyrillic letters");
}

This code works fine for all cyrillic letters except Ё ё

Weafs.py
  • 22,731
  • 9
  • 56
  • 78
Rey Rajesh
  • 502
  • 1
  • 6
  • 25
  • 1
    try this: `var letters = /^[А-Яа-яёЁ]+$/;` – Mark Zucchini Nov 04 '14 at 09:22
  • @MarkZucchini: That's not how character classes work. Remove the `|` – Cerbrus Nov 04 '14 at 09:23
  • This might be helpful: http://en.wikipedia.org/wiki/Cyrillic_script#Computer_encoding – nhahtdh Nov 04 '14 at 09:28
  • 3
    "Cyrillic only" requirement is vague, because there are plenty of languages that use subsets of the cyrillic script, and characters valid on one language doesn't exist in another. What _language_ are you trying to match? – georg Nov 04 '14 at 09:29
  • 1
    Yes. The problem why ё is not working because it's out of range A-Я. A-Я is a basic Cyrillic alphabet [0430-044F], but ё isn't in that basic alphabet. it is in Cyrillic extensions [0400-045F]. So, javascript regexs compares not by letters itself but by its charcodes, so ё just is out of range. – Mark Zucchini Nov 04 '14 at 09:37
  • @MarkZucchini, make your comments an answer, it will be the correct one, assuming “Cyrillic letter” means “letter used in modern Russian”, which is apparently the intent. – Jukka K. Korpela Nov 04 '14 at 11:40

3 Answers3

3

The problem why ё is not working because it's out of range Aа-Яя. Aа-Яа is in a Basic Cyrillic alphabet [0430-044F], but ё isn't in that Basic Cyrillic alphabet. ё belongs to Cyrillic extensions [0400-045F]. Because, JavaScript regexs engine compares not by letters itself but by its charcodes, so ё just is out of range.

Since I presume you mean modern Russian language where despite ё is rare but still in wide use I may suggest this solution

var namevalue = document.getElementById("name")

// please note that I added to your pattern "еЁ".
// now this matches all Russian cyrillic letters both small and caps
// plus ё and Ё
var letters = /^[А-Яа-яёЁ]+$/; 

if (namevalue.matches(letters)) {
   alert("Accepted");
}
else {
   alert("Enter only cyrillic letters");
} 

Unfortunately the problem with A-Я and Ё buried deep in Unicode specification. There is no plain and simple solution. So for robust programming you need always be prepared for such cases.

Mark Zucchini
  • 925
  • 6
  • 11
1

That Ë isn't necessarily in the cyrillic alphabet, and as such, not caught in the А-Яа-я range you're using.

Is your Ë Cyrillic: U+0401 or just Latin: U+00CB?

If you also want to catch non-cyrillic Ë's , you may want to add this range to your regex: À-ÿ:

alert(JSON.stringify("Ëë".match(/^[À-ÿ]+$/)))

If you just want to catch Ë's in the Cyrillic alphabet, try this:

Instead of starting your range at U+0410 (А), start it at U+0400 (Ѐ) and end it at U+045F (џ):

alert(JSON.stringify("Ёё".match(/^[Ѐ-џ]+$/)))

(This last range should include the full Cyrillic alphabet.)

Source: Unicode character codes. You can use this page to check what range(s) you need to add to your regex.

Cerbrus
  • 70,800
  • 18
  • 132
  • 147
  • 1
    I'm not sure how to say about the first option. It is visually similar to the Cyrillic e umlaut (U+0451), but it is in Latin script (which is unlikely to get mixed in Cyrillic text). I think your second solution is probably what OP wants, but it will include a bunch of unused Cyrillic characters. – nhahtdh Nov 04 '14 at 09:27
  • Yea, that's somewhat problematic. I'm not familiar with Cyrillic, so I don't know what characters are, and aren't used. If you are, please suggest a better range to use :-) – Cerbrus Nov 04 '14 at 09:29
  • 1
    You'll always have unused cyrillic characters unless you're only looking to cater to one language. They're all used in *some* language (o/w they wouldn't exist!), but no *single* language uses them all – blgt Nov 04 '14 at 09:31
  • There is another thing - which might be problematic: not all accented Cyrillic characters are encoded, which means that you might have to take the combining mark into account. – nhahtdh Nov 04 '14 at 09:33
  • The Latin letter Ë is irrelevant in this context. If you want to allow Latin letters that have a shape identical with a Cyrillic letter, you need to allow a lot more; allowing specifically Latin Ë would be very odd. The problem with the other part of the answer is that allows a collection of Cyrillic letters arbitrarily, rejecting many of them. The expressions does not stand for all Cyrillic letters, but neither does it stand for the set of letters used in Russian, which is apparently the intent. – Jukka K. Korpela Nov 04 '14 at 11:38
  • @JukkaK.Korpela: read my comments on this answer. I'm not familiar with the Cyrillic alphabet. If you know of a better range, please enlighten me. Also, I'm not just catching only that Latin Ë. This answer is not a copy-paste solution, just a explanation of what's probably going wrong, and a suggestion on how to fix it. – Cerbrus Nov 04 '14 at 11:57
  • @JukkaK.Korpela: I fixed the range in my answer. – Cerbrus Nov 04 '14 at 12:15
  • The range from U+0400 to U+045F is neither the set of all Cyrillic letters nor the set of Cyrillic letters used in Russian. – Jukka K. Korpela Nov 04 '14 at 13:46
  • @JukkaK.Korpela: Like I said before: [_"I'm not familiar with Cyrillic, so I don't know what characters are, and aren't used. If you are, please suggest a better range to use"_](http://stackoverflow.com/questions/26731220/why-regular-expression-for-cyrillic-letters-misses-a-letter/26731389?noredirect=1#comment42050647_26731389) And: [_"This answer is not a copy-paste solution, just a explanation of what's probably going wrong"_](http://stackoverflow.com/questions/26731220/why-regular-expression-for-cyrillic-letters-misses-a-letter/26731389?noredirect=1#comment42055650_26731389) – Cerbrus Nov 04 '14 at 13:47
  • @JukkaK.Korpela: Instead of just telling me I'm doing it wrong, how about a suggestion what to use, instead? – Cerbrus Nov 04 '14 at 13:48
  • I have suggested that @MarkZucchini post his comments as the correct answer. – Jukka K. Korpela Nov 04 '14 at 14:17
1

You can find ёЁ in cyrillic extension and not in А-Яа-я t

CHANDRU S
  • 26
  • 2