0

I want users to prevent entering double-byte characters in input fields. Following code will allow users to enter a-z or A-Z. I want users to prevent entering double-byte characters like Korean, Chinese etc. But users should be allowed to enter Spanish characters since those are not double-byte characters. This should work when user copy-paste double-byte characters.

$("#myTextBox").bind("keypress", function(event) { 
        var charCode = event.which;
        var keyChar = String.fromCharCode(charCode); 
        return /[a-zA-Z]/.test(keyChar); 
    });
Kasun
  • 196
  • 1
  • 14
  • @Ryan I have a requirement like that.. i need to prevent users from entering double-byte characters, but allow user to enter special characters like in Spanish, French.. – Kasun Feb 20 '18 at 14:54
  • Where does the requirement come from? Is it actually about bytes, or something else? I ask because the concept of bytes is specific to encodings. For example, in UTF-8 – the most common encoding – ñ is two bytes. In ISO-8859-1, where it would be a single byte, the concept of “double-byte characters” doesn’t even exist and you’re missing out on a lot more than Korean. – Ry- Feb 20 '18 at 15:43
  • @Ryan, This requirement comes as blocking some languages from entering in the textbox. preventing languages like Chinese, Japanese, Korean, Hindi etc.. but allow languages like Spanish, German, French.. is there a easy way of doing this? – Kasun Feb 21 '18 at 06:28
  • Where does *that* requirement come from? Is there a specific list of languages you need to accept, a specific list of languages you need to reject, or some other reason? There are ways to do both but you have to pick. – Ry- Feb 23 '18 at 03:59
  • 3
    Why specifically Asian? There are a lot more writing systems than Latin and Asian. Perhaps you want to *whitelist Latin*? – deceze Feb 26 '18 at 10:12

1 Answers1

2

Technically, you can set a pattern attribute and list all characters that are allowed, like this:

<input pattern="^[-a-zA-Z0-9 äÄöÖüÜßẞÇçâêîôûàèùéêëïü]*$" />

or, if you want to allow all unicode characters in the range up to and including Arabic:

<input pattern="^[ -&#2303;]+$" />

Note that both solutions leave out some characters that non-Asian users may use, for instance in the first pattern Scandinavian characters like å or ø and in the second pattern the upper-case ẞ, Emoji, and more. If you can classify the 100000+ characters in the Unicode standard, you can simply list all allowed in the pattern.

A pattern allows typing the characters, but you can use the :invalid CSS class to give appropriate feedback. If you really want to delete the characters, you can clean them, like this (live demo):

input.addEventListener('input', () => {
    var allowed_m = /\[([^\]]*)\]/.exec(input.pattern);
    var negative_pattern = new RegExp('[^' + allowed_m[1] + ']', 'g');
    input.value = input.value.replace(negative_pattern, '');
}

Any of these solutions are user-hostile though. You will almost certainly miss corner cases (already here: is Arabic an Asian language? Are characters that occur in Asian languages and also non-Asian languages forbidden?), and users from all around the world will be frustrated about the experience on your website.

Instead, fix the code that deals with exotic characters, and explain to the user why Latin characters are preferred.

phihag
  • 278,196
  • 72
  • 453
  • 469
  • Regarding that last paragraph: sometimes it's simply a requirement to accept only certain characters for certain fields; e.g. in government software that deals with people's names, you simply sometimes need to romanise your name in a European country, or kanarise your name in Japan etc. – deceze Feb 26 '18 at 11:10
  • @phihag I'm interested in your 2nd solution in which you are using a unicode range as a pattern. I want to use unicode range from "U+0000" to "U+00FF". how can I add my range to your pattern solution? [You can find the unicode range from here](http://www.utf8-chartable.de/). I want to use "U+0000 ... U+007F: Basic Latin". – Kasun Mar 27 '18 at 06:43
  • That would be ``. Beware that this pattern contains plenty of hostile characters, such as newline, the NUL character, the DEL character, vertical tabs, the bell character, and a lot of other funky stuff. – phihag Mar 27 '18 at 08:59