How can I tell if a string has any non-ASCII characters in it?

Question

I'm looking to detect internationalized domain names and local portions in email addresses, and would like to know if there is a quick and easy way to do this with regex or otherwise in Javascript.

What do you mean by ASCII? Remember that NUL (\0), BEL (\7 - causes PC to beep), ESC (\033) are also valid ASCII characters but most would't consider them to be valid ASCII text. — slebetman, Nov 23 '12 at 04:13

alex · Answer 1 · 2017-01-16T19:37:34.207

31

This should do it...

var hasMoreThanAscii = /^[\u0000-\u007f]*$/.test(str);

...also...

var hasMoreThanAscii = str
                       .split("")
                       .some(function(char) { return char.charCodeAt(0) > 127 });

ES6 goodness...

let hasMoreThanAscii = [...str].some(char => char.charCodeAt(0) > 127);

edited Jan 16 '17 at 19:37

answered Nov 23 '12 at 03:40

alex

479,566
201
878
984

Shouldn't the `+` be a `*`? This requires that the string has characters in it, but the empty string `""` fulfills the OP's strict requirements: it doesn't have any non-ASCII characters in it. – Jeff Nov 23 '12 at 03:50
If you change your `.filter` to `.some`, you can get rid of `.length > 0` – I Hate Lazy Nov 23 '12 at 03:52
@user1689607 True. I'll also get rid of a bit of browser support ;) – alex Nov 23 '12 at 03:52
Nah, any browser that supports `.filter()`, supports `.some()`. They're both ES5 additions. :) – I Hate Lazy Nov 23 '12 at 03:54
@alex: Sorry, my regex doesn't work somehow (also I deleted the post before seeing your reply, sorry) – slebetman Nov 23 '12 at 04:14
the second one was broken for me before I added a `return` before the `char.charCodeAt...` – MalcolmOcean Apr 27 '14 at 03:01
`/^[\u0-\u7f]*$/.test("a/b")` returns false for some reason, it turns out the fix is `/^[\u0000-\u007f]*$/`. I've edited the answer. – Flimm Sep 14 '15 at 16:00
2

The variable name is inverted. /^[\u0000-\u007f]*$/.test(str) = true when ascii so variable name should be: var isAscii = /^[\u0000-\u007f]*$/.test(str) – Munawwar Aug 28 '21 at 08:02
Or it should be negated like `var hasMoreThanAscii = !/^[\u0000-\u007f]*$/.test(str);` – ypresto Nov 04 '21 at 14:23
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Functions/rest_parameters – mustafa candan Aug 27 '22 at 10:52

elclanrs · Accepted Answer · 2012-11-23T05:11:48.543

23

Try with this regex. It tests for all ascii characters that have some meaning in a string, from space 32 to tilde 126:

var ascii = /^[ -~]+$/;

if ( !ascii.test( str ) ) {
  // string has non-ascii characters
}

Edit: with tabs and newlines:

/^[ -~\t\n\r]+$/;

edited Nov 23 '12 at 05:11

answered Nov 23 '12 at 03:49

elclanrs

92,861
21
134
171

1

So tabs and newlines don't count as OK characters? – Jonathan Leffler Nov 23 '12 at 04:58
@JonathanLeffler: Right... I added that case as well. – elclanrs Nov 23 '12 at 05:12
1

@elclanrs I'm glad that you differentiated, though, because for many usecases they wouldn't be desired. – wwaawaw Nov 23 '12 at 05:38
1

All Ascii characters have meanings, but not all of them are allowed or suitable in a particular context. The variable name `ascii` would be misleading here. – Jukka K. Korpela Nov 23 '12 at 07:45

score 11 · Answer 3 · answered Nov 23 '12 at 03:40

11

charCodeAt can be used to get the character code at a certain position in a string.

function isAsciiOnly(str) {
    for (var i = 0; i < str.length; i++)
        if (str.charCodeAt(i) > 127)
            return false;
    return true;
}

answered Nov 23 '12 at 03:40

Nathan Wall

10,530
4
24
47

2

Isn't the largest ASCII character 127? – alex Nov 23 '12 at 03:41
1

I believe the largest ASCII code now is 255 check here http://www.ascii-code.com/ – repzero Feb 05 '17 at 23:04

score 1 · Answer 4 · answered Jun 21 '21 at 15:00

1

Simpler alternative to @alex's solution:

const hasNonAsciiCharacters = str => /[^\u0000-\u007f]/.test(str);

answered Jun 21 '21 at 15:00

Munawwar

1,712
19
14

score 0 · Answer 5 · answered Jun 23 '22 at 16:25

You can use string.match() or regex.test() to achieve this. Following function will return true if the string contains only ascii characters.

string.match()

function isAsciiString(text) {
    let isAscii = true;
    if (text && !text.match(/^[\x00-\x7F]+$/g)) {
        isAscii = false;
    }
    return isAscii;
}

regex.test()

function isAsciiString(text) {
    return /^[\x00-\x7F]+$/g.test(text);
}

Example

console.log(isAsciiString("hello"));   // true
console.log(isAsciiString("hello©"));  // false

How can I tell if a string has any non-ASCII characters in it?

5 Answers5