0

Basically, I need to check that a utf-16 string does not contain these characters /:*?<>|+. Apart from them, it can contain any character from English to Latin.

For normal ASCII strings, we would write a RegEx something like ^[^\/:?<>|+]$ How does this expression change for UTF-16 formatted strings?

Can we represent this expression using ascii characters in the RegEx? Or should we have there equivalent unicode code points for matching any characters?

  • 3
    What happens if you try? Which language are you using? [And do some reading here.](http://www.regular-expressions.info/unicode.html) – Martin Ender Oct 23 '12 at 12:57
  • What programming language? Handling of Unicode varies greatly. It should just work in most cases, though. – dan1111 Oct 23 '12 at 13:03
  • I have tried the above expression in JavaScript, and tested using different natural language (en,jp, tw) strings. Seems to pass them (not match) Ok, and blocks (matches) when any of these special characters appear. but was wondering if that's the right way. And I need this expression for JavScript, C++ and XML(xsd validations). The tricky part is the xsd, where there is no way to specify unicode code points(that is \u+002F etc.,), so if the above ascii expression works it will be great. Just wanted to confirm that I am not missing some details on how utf-16 strings should be RegExed. – Vijay Oct 23 '12 at 13:42

1 Answers1

1

As all of your special character that you don't want to allow are normal ASCII characters, use regex pattern

/^[^\/:*?<>|+]*$/
Ωmega
  • 42,614
  • 34
  • 134
  • 203
  • I know that you are just repeating the pattern that the questioner gave, but it is worth pointing out that this checks that the string *only* contains those characters (and also matches an empty string). This would check if the string contains any of the characters: `/[\/:*?<>|+]/`. – dan1111 Oct 23 '12 at 13:06
  • @dan1111 - Question title is **"...to check the string does not contain specific characters"**, so I believe it should match if it does not contain such characters. – Ωmega Oct 23 '12 at 13:08
  • 1
    sorry, my mistake. I somehow missed the negation on the character class. – dan1111 Oct 23 '12 at 13:17
  • Thanks. I just wanted to confirm that we can use regular ascii special characters like ($, <, > etc.,) as it is for unicode strings. – Vijay Oct 23 '12 at 13:44
  • @Vijay - Yes, you can, but don't forget that sometimes some characters need to be escaped by `\`. – Ωmega Oct 23 '12 at 13:59