How to detect non-roman characters in JS?

Question

How can I detect non-roman characters in a string? Mind you, it's not as simple as classing all characters outside of the scope A-Z and 0-9. There are lots of variations on roman characters like the German ä,ö,ü - which are still roman, "中文" on the other hand, is clearly not roman script.

welcome to stackoverflow. We give help to specific problems, and it is common for the asker to present what he tried so far to solve his problem him/herself and get feedback and help based on that. — Winchestro, Jun 08 '14 at 16:20

Tim · Accepted Answer · 2014-06-11T13:22:20.367

JavaScript is natively Unicode and the character ranges for various scripts are well documented at http://www.unicode.org/charts/

You will see that there are several blocks that correspond to Latin (Roman) scripts. The most common of these is the high ASCII range known as Latin-1 supplement in the range 0080–00FF. This will include the German characters you mention.

JavaScript lets us test for Unicode ranges nicely using Regular expressions. So you could detect Latin 1 supplement characters in several strings as per this example:

var en = 'Coffee',
    fr = 'Café',
    el = 'Καφές';

console.log( en.replace( /[\u0080-\u00FF]/g, '*') );
console.log( fr.replace( /[\u0080-\u00FF]/g, '*') );
console.log( el.replace( /[\u0080-\u00FF]/g, '*') );

This will print out:

Coffee
Caf*
Καφές

Because according to our character ranges only the accented é matches the latin supplement range (hence it is replaced with *)

So to better answer your question, to detect "non-roman" characters you could do:

var str = 'a ä ö ü 中 文',
    reg = /[^\u0000-\u024F\u1E00-\u1EFF\u2C60-\u2C7F\uA720-\uA7FF]/g;

console.log( str.replace( reg, '?') );

Which would show:

a ä ö ü ? ?

You can use these ranges to do whatever it is you specifically need. I put together this crude tool for building regex from unicode blocks, but I'm quite sure there are better resources out there,

How to detect non-roman characters in JS?

1 Answers1

Linked