2

I have a question according to this link http://support.microsoft.com/kb/188997 ( A computer name can be up to 15 alphanumeric characters with no blank spaces. The name must be unique on the network and can contain the following special characters: ! @ # $ % ^ & ( ) - _ ' { } . ~

The Following characters are not allowed: \ * + = | : ; " ? < > , )

and I am developing in C++

so i used the following code but when i input character which isn't allowed.. it is matched ! why ?

 regex  rgx("[a-zA-Z0-9]*(!|@|#|$|%|^|&|\(|\)|-|_|'|.|~|\\{|\\})*[a-zA-Z0-9]*");


string name;
    cin>>name;

if (regex_match(name, rgx))
{
    cout << " Matched :) " << endl;
}
else
    cout << "Not Matched :(" << endl;

your help will be greatly appreciated :)

Rehab Reda
  • 193
  • 7
  • 16
  • Beware: the [currently accepted answer](https://stackoverflow.com/a/24095455/3225396) does not correct an additional error in your example approach: NetBIOS names [forbid](https://en.wikipedia.org/wiki/NetBIOS#NetBIOS_name) "`.`" and "`-`" from being the first or last character. – Keith Russell Oct 16 '20 at 14:26

2 Answers2

2

Your regular expression will match any string, because all your quantifiers are "none or more characters" (*) and since you're not looking for start and end of the string, you'll match even empty strings. Also you're using an unescaped ^ within one pair of brackets ((...|^|...), which will never match, unless this position is the beginning of a string (which may happen due to the *quantifier as explained above).

It's a lot more easier to achieve what you're trying to though:

regex rgx("^[\\w!@#$%^()\\-'{}\\.~]{1,15}$");

If you're using C++11, you might as well use a raw string for better readability:

regex rgx(R"(^[\w!@#$%^()\-'{}\.~]{1,15}$)");

This should match all valid names containing at least one (and up to) 15 of the selected characters.

  • \w matches any "word" character, that is A-Z, a-z, digits, and underscores (and based on your locale and regex engine possibly also umlauts and accented characters). Due to this it might be better to actually replace it with A-Za-z\d_ in the above expression:

    regex rgx("^[A-Za-z\\d_!@#$%^()\\-'{}\\.~]{1,15}$");
    

    Or:

    regex rgx(R"(^[A-Za-z\d_!@#$%^()\-'{}\.~]{1,15}$)");
    
  • {a,b} is a quantifier matching the previous expresssion between a and b times (inclusive).

  • ^ and $ will force the regular expression to fill the whole string (since they'll match beginning and end).
Mario
  • 35,726
  • 5
  • 62
  • 78
  • Thank you so much You helped me alot ! :D but I have a question %^%()\\-'{}\\. what is % is it escape character what is the difference between % and \ ? and why you didn't put escape character before ) and {? Thank you so much ! :) – Rehab Reda Jun 07 '14 at 09:05
  • Ah, the additional `%` is a mistake. It's no escape character (only for regular expressions in Lua if you're curious). I simply wrote it a second time by accident. I didn't escape the brackets within `[]` simply due to the fact that the parser will be able to notice that it's still inside the brackets, so there can't be any special meaning brackets and they're really just characters. – Mario Jun 07 '14 at 09:08
  • To add: You still have to escape characters with special meaning, like `-` (which could mark some range) and `.` (which is a character class). Using `[a-z.]` would essentially be the same as `[.]` or just `.`. – Mario Jun 07 '14 at 09:10
  • You forgot the ampersand character which is also allowed – Tal Aloni Oct 20 '19 at 08:29
1

Look here: http://www.cplusplus.com/reference/regex/ECMAScript/ . There you have something about special characters (with a special meaning for a regex).

For example, ^ has a special meaning in a regex, so you must escape it: \^. Other special characters are: $ \ . * + ? ( ) [ ] { } |.

Also, I thing your regex will not allow names like a-b-c (multiple parts of special characters, or more than two parts of alphanumerical characters).

tasegula
  • 889
  • 1
  • 12
  • 26
  • As far as I know it should be fine to not escape non-class characters within groups, e.g. `[()]` would match both brackets. The only exception would be having `^` as the first character. So `[x^]` and `[x\^]` should both represent identical expressions, but `[^x]` and `[\^x]` don't for obvious reasons. – Mario Jun 07 '14 at 08:47
  • As for the last part of your answer: They'll be accepted, since the expression just tries to match any part of the string, even if it has a length of 0 characters. Since the matches are greedy, it would match `a-b` in your example. – Mario Jun 07 '14 at 08:48