2

I am trying to allow users to register a username that contains only alphabet letters (of any language), numbers, or hyphens in it. I'm trying to check if a username breaks this rule.

So far this is working to find out if a username does not contain only alphanumeric characters:

REFindNoCase('^[[:alnum:]]', ARGUMENTS.Username)

Which is fine because if I get back a found result then I know its an invalid username format with special characters in it. But I also want to allow hyphens through. How could I express in regex like (pseudo-code follows):

REFindNoCase('^[[:alnum:]\-]', ARGUMENTS.Username)

I can only use Perl compatible Regex because I am using ColdFusion which uses that standard mostly.

ikegami
  • 367,544
  • 15
  • 269
  • 518
volume one
  • 6,800
  • 13
  • 67
  • 146

2 Answers2

4

First of all, you're wrong about REFindNoCase('^[[:alnum:]]', ARGUMENTS.Username) being fine. It checks if the first character is alphnumeric.

$ for q in Abcdef Abc123 Abc-123 Abc/123 ; do
   if echo "$q" | grep -qP '^[[:alnum:]]'
   then echo "$q: match"
   else echo "$q: no match"
   fi
done
Abcdef: match
Abc123: match
Abc-123: match
Abc/123: match

(grep -P uses PCRE too.)

To look for character that is not an alnum character, you'd use

[^[:alnum:]]

As seen here:

$ for q in Abcdef Abc123 Abc-123 Abc/123 ; do
   if echo "$q" | grep -qP '[^[:alnum:]]'
   then echo "$q: match"
   else echo "$q: no match"
   fi
done
Abcdef: no match
Abc123: no match
Abc-123: match
Abc/123: match

To look for character that are neither an alnum character nor "-", you'd use

[^[:alnum:]-]

As seen here:

$ for q in Abcdef Abc123 Abc-123 Abc/123 ; do
   if echo "$q" | grep -qP '[^[:alnum:]-]'
   then echo "$q: match"
   else echo "$q: no match"
   fi
done
Abcdef: no match
Abc123: no match
Abc-123: no match
Abc/123: match

By the way, REFind would work just as a well as REFindNoCase since alnum includes both uppercase and lowercase letters, so might as well use REFind.

ikegami
  • 367,544
  • 15
  • 269
  • 518
  • Thank you for the suggestion. I just solved it using after much trial and error: `REFind('[^[:alnum:]-]', ARGUMENTS.Username)`. I have no idea why it works, but the only difference is that the negation `^` is inside the first opening square bracket – volume one Oct 18 '15 at 00:06
  • That's the opposite of what you asked. That looks if the string contains a character that's neither an "alnum" nor a "-". – ikegami Oct 18 '15 at 00:09
  • How do you mean? I want it to match where a string does not contain alphanumeric characters or a hyphen. Which would then in turn flag to me that the username contains special characters like `<>%$£` etc – volume one Oct 18 '15 at 00:10
  • Right, which is the opposite of what you asked. – ikegami Oct 18 '15 at 00:13
  • Are there any other hyphens/dashes I need to include? There is a Unicode pattern of `{Pd}` which matches all sorts of dashes. Do you know of any equivalent I could use? – volume one Oct 18 '15 at 00:29
  • I don't know what you mean by "need to include". – ikegami Oct 18 '15 at 00:29
  • I mean I want it not to match the minus character as well which I guess is different to a hyphen. I suppose an em-dash and en-dash would also also be allowed.. so it shouldn't match on those either. Do I have to add all these manually like the hyphen character? – volume one Oct 18 '15 at 00:31
  • If you want to look for those specific chars, then yeah, you're going to have to add them. It will only look for the characters you specify – ikegami Oct 18 '15 at 00:33
0

Update: I looked at this question RegEx: \w - "_" + "-" in UTF-8

Final solution after trial and error (the negation had to be inside the opening bracket):

REFind('[^[:alnum:]-]', ARGUMENTS.Username)

Community
  • 1
  • 1
volume one
  • 6,800
  • 13
  • 67
  • 146