2

Regex for JAVA : I have a requirement of matching the value of a request parameter with unicode charcters but it should not allow space . Basically a regex which should allow all unicode charcters without space.I tried with all efforts but in vain :(

I got the below regex from ur site but it allows space too, So please help

[[a-zA-Z]*[^\\pL\\pM\\p{Nd}\\p{Nl}\\p{Pc}[\\p{InEnclosedAlphanumerics}&&\\p{So}]]*[a-zA-Z]]{1,440}

For Example "Suraj$÷" should be true but " Suraj $÷" this should be false

tchrist
  • 78,834
  • 30
  • 123
  • 180
Suraj
  • 21
  • 1
  • What is a "request parameter"? – Tomalak Aug 26 '11 at 09:36
  • Request parameter just like a message content in URL(ex:http:ip:server?messcon="UTF-content"). I should match this content of messcon(URL) with a regex tat will only allow UTF charcters and without space – Suraj Aug 26 '11 at 10:29

1 Answers1

1

How about:

^[^\p{whitespace}]+$

or

^\P{whitespace}+$

or, if the Unicode character property {whitespace} isn't allowed,

^[^\u0009-\u000D\u0020\u0085\u00A0\u1680\u180E\u2000-\u200A\u2028\u2029\u202F\u205F\u3000]+$

that will match a string that doesn't contain any Unicode white space characters.

tchrist
  • 78,834
  • 30
  • 123
  • 180
Toto
  • 89,455
  • 62
  • 89
  • 125
  • Am using JDK1.4 am getting the below error Unknown charcter category{whitespace} – Suraj Aug 26 '11 at 09:49
  • Request parameter is a value like message conetnt appende in URL it may contain Only UTF-8 charcters and without space – Suraj Aug 26 '11 at 09:51
  • @Suraj: have a look at http://en.wikipedia.org/wiki/Whitespace_character then make a characters class with all the values for white spaces in unicode – Toto Aug 26 '11 at 09:58
  • But should is there not any simple regex that will help me out here – Suraj Aug 26 '11 at 10:06
  • @Suraj: I'm not a java expert, but it seems JDK1.4 doesn't recognize this character category so you have to explicity fill your character class. – Toto Aug 26 '11 at 10:14
  • @Suraj: The Unicode whitespace characters are `[\u0009-\u000D\u0020\u0085\u00A0\u1680\u180E\u2000-\u200A\u2028\u2029\u202F\u205F\u3000]`. Java does not have a property that matches that in Java6 or prior, but in JDK7, the `UNICODE_CHARACTER_CLASSES` or `"(?U)"` compilation flag should swap `\s` over to aligning with [UTS#18](http://unicode.org/reports/tr18/#Compatibility_Properties) (and therefore that bracketed character class). I haven’t checked to make sure it really does what it says it does, though. – tchrist Aug 26 '11 at 11:50
  • @M42: Java prior to JDK 7 does not have a pattern class that meets the Level 1 requirements of [UTS#18](http://unicode.org/reports/tr18/#Basic_Unicode_Support) for Basic Unicode Support. It therefore lacked the `White_Space` property required by [RL 1.2, Properties](http://unicode.org/reports/tr18/#Categories). It added some missing props in JDK7, so that might be there now, but it is still a long ways from [RL 2.7, Full Properties](http://www.unicode.org/reports/tr18/tr18-14.html#Full_Properties). – tchrist Aug 26 '11 at 11:55
  • So using these [\u0009-\u000D\u0020\u0085\u00A0\u1680\u180E\u2000-\u200A\u2028\u2029\u202F\u205F\u3000] values can we :) form a regex tat takes any UTF 8 characters with space. Even if its bit complicated its fine evn i ll try it out here too. Thanks for help in advance – Suraj Aug 26 '11 at 12:14
  • @Suraj you shouldn’t be thinking about “UTF-8 characters”. You should have already decoded the UTF-8 into abstract Unicode characters. Otherwise the regex engine won’t work right. – tchrist Aug 26 '11 at 13:14