-2

I need a regular expression to match a-zA-Z0-9 as well as whitespace and special characters, but only including English whitespace/special characters, not those of other languages like French or Spanish.

Thanks.

jogojapan
  • 68,383
  • 11
  • 101
  • 131
MBehtemam
  • 7,865
  • 15
  • 66
  • 108
  • 6
    define "special characters" – amit Apr 11 '12 at 07:39
  • 2
    right now your asking for a regex that matches `anything`, so here it is: `.*`, you'll need to specify exactly what it should and should not match otherwise you'll get no helping answers... – red-X Apr 11 '12 at 07:43
  • Do you mean something that matches char codes 0x20 (space) thru 0x72 (tilde) but excludes non-English chars (such as those found in the 0xA0 - 0xFF) range? – Stuart Siegler Apr 11 '12 at 16:38
  • What about words like "naïve", which (when spelled correctly) have "special" characters in them, yet are English words? – Bohemian Apr 14 '12 at 11:54

3 Answers3

1

It's not possible/practical to write a regular expression that matches English, but not French, Spanish and other languages.

If you really want to test if a word is from the English language, you can write some code to look it up in a English dictionary. That should be simple enough.

  • 1
    A dictionary solution tends to fail for names and places, especially when we are talking about a single word and not a full document. – amit Apr 11 '12 at 07:49
  • @amit, there are workarounds for that, for example ignoring words starting with a capitol letter. It will never be perfect, but its close enough. – Matt Apr 11 '12 at 07:54
1

Depending on the regex engine, you may be able to use:

^\p{IsBasicLatin}*$

To allow only characters in the Basic Latin character set, which includes standard English lanuage punctuation (i.e., the characters that can be directly entered on a U.S. keyboard).

drf
  • 8,461
  • 32
  • 50
1

I was looking for a regular expression that would match regular english text (and avoid maybe html/xml/url etc) and landed on this page. I think the questioner just wanted to avoid character with phonetic information in it but allow for english punctuation characters. I ended up writing something by myself looking at my keyboard

[A-Za-z\d,.?;:\'"!$%() ]*

I don't claim this will work for everyone but was good enough for me.

T A
  • 566
  • 5
  • 13
  • is there anyway so that i can match only special characters , it should not include any digit and albhabet ?? – XYZ_Linux Nov 27 '13 at 05:32