5

Presently, I am using this:

if (preg_match ('/^[a-zA-Z0-9_]+([a-zA-Z0-9_]*[.-]?[a-zA-Z0-9_]*)*[a-zA-Z0-9_]+$/', $product) ) {
    return true;
} else { 
    return false
}

For example, I want to match:

  1. pro.duct-name_
  2. _pro.duct.name
  3. p.r.o.d_u_c_t.n-a-m-e
  4. product.-name
  5. ____pro.-_-.d___uct.nam._-e

But I don't want to match:

  1. pro..ductname
  2. .productname-
  3. -productname.
  4. -productname
mickmackusa
  • 43,625
  • 12
  • 83
  • 136
banskt
  • 416
  • 3
  • 7
  • 17
  • Edited the examples, so that its more understandable. Does it need further explanation. Please do let me know, I would be glad to clarify further. – banskt May 26 '12 at 07:13
  • Why shouldn't `pro..ductname` match? The dots are in the middle? – Ja͢ck May 26 '12 at 07:16
  • If only ``dot`` would not come twice or any character? – Cylian May 26 '12 at 07:16
  • Because, I don't want to match `dot` or `dash` twice consecutively. `Dot` and `dash` can appear multiple number of times in the middle, but not consecutively. Now, what happens if `dot` and `dash` appear after one another? We allow `product.-name` – banskt May 26 '12 at 07:19
  • Q: "If only `dot` would not come twice or any character?" A: `Dot` and `dash` would not come twice, any other alphanumeric characters can come twice, `ppppppppp` should match. – banskt May 26 '12 at 07:23

5 Answers5

11

The answer would be

/^[a-zA-Z0-9_]+([-.][a-zA-Z0-9_]+)*$/

if only you allowed strings containing .- and -. NOT to match. Why would you allow them to match, anyway? But if you really need these strings to match too, a possible solution is

/^[a-zA-Z0-9_]+((\.(-\.)*-?|-(\.-)*\.?)[a-zA-Z0-9_]+)*$/

The single . or - of the first regex is replaced by a sequence of alternating . and -, starting with either . or -, optionally followed by -. or .- pairs respectively, optionally followed by a - or . respectively, to allow for an even number of alternating chars. This complexity is probably an overshoot, but appears to be needed by current specifications. If a max of 2 alternating . and - is required, the regex becomes

/^[a-zA-Z0-9_]+((\.-?|-\.?)[a-zA-Z0-9_]+)*$/

Test here or here

Walter Tross
  • 12,237
  • 2
  • 40
  • 64
  • The second one actually works. Thanks a lot, though I must admit, I do not completely understand the sequence of your second regex. – banskt May 26 '12 at 09:33
  • And, I love this bit - `(\.-?|-\.?)[a-zA-Z0-9_]+` in the regex. That solves the problem. Great logic. – banskt May 26 '12 at 10:13
  • :-) thanks. I added a last regex taking in account what you just wrote – Walter Tross May 26 '12 at 10:26
3

Try this

(?im)^([a-z_][\w\.\-]+)(?![\.\-])\b

UPDATE 1

(?im)^([a-z_](?:[\.\-]\w|\w)+(?![\.\-]))$

UPDATE 2

(?im)^([a-z_](?:\.\-\w|\-\.\w|\-\w|\.\w|\w)+)$

Explanation

<!--
(?im)^([a-z_](?:\.\-\w|\-\.\w|\-\w|\.\w|\w)+)$

Match the remainder of the regex with the options: case insensitive (i); ^ and $ match at line breaks (m) «(?im)»
Assert position at the beginning of a line (at beginning of the string or after a line break character) «^»
Match the regular expression below and capture its match into backreference number 1 «([a-z_](?:\.\-\w|\-\.\w|\-\w|\.\w|\w)+)»
   Match a single character present in the list below «[a-z_]»
      A character in the range between “a” and “z” «a-z»
      The character “_” «_»
   Match the regular expression below «(?:\.\-\w|\-\.\w|\-\w|\.\w|\w)+»
      Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
      Match either the regular expression below (attempting the next alternative only if this one fails) «\.\-\w»
         Match the character “.” literally «\.»
         Match the character “-” literally «\-»
         Match a single character that is a “word character” (letters, digits, and underscores) «\w»
      Or match regular expression number 2 below (attempting the next alternative only if this one fails) «\-\.\w»
         Match the character “-” literally «\-»
         Match the character “.” literally «\.»
         Match a single character that is a “word character” (letters, digits, and underscores) «\w»
      Or match regular expression number 3 below (attempting the next alternative only if this one fails) «\-\w»
         Match the character “-” literally «\-»
         Match a single character that is a “word character” (letters, digits, and underscores) «\w»
      Or match regular expression number 4 below (attempting the next alternative only if this one fails) «\.\w»
         Match the character “.” literally «\.»
         Match a single character that is a “word character” (letters, digits, and underscores) «\w»
      Or match regular expression number 5 below (the entire group fails if this one fails to match) «\w»
         Match a single character that is a “word character” (letters, digits, and underscores) «\w»
Assert position at the end of a line (at the end of the string or before a line break character) «$»
-->

And you could test it here.

Cylian
  • 10,970
  • 4
  • 42
  • 55
  • 1
    \w is not the same as [a-zA-Z0-9_] – Walter Tross May 26 '12 at 08:10
  • I don't know if this is what @Walter is referring to, but to elaborate a bit, [the PHP manual](http://www.php.net/manual/en/regexp.reference.escape.php) says: _A "word" character is any letter or digit or the underscore character, that is, any character which can be part of a Perl "word". The definition of letters and digits is controlled by PCRE's character tables, and may vary if locale-specific matching is taking place. For example, in the "fr" (French) locale, some character codes greater than 128 are used for accented letters, and these are matched by \w._ – Herbert May 26 '12 at 08:39
  • @WalterTross: Can you provide an example where it doesn't work because the test works fine for the OP's example data. – Herbert May 26 '12 at 08:46
  • product.-name is required to match (see the OP's comments), and doesn't. 123product should match too, given the OP's first regex. The part (?![\.\-]) is not needed, because it is implied in what precedes it. [\.\-] is more readable as [-.] – Walter Tross May 26 '12 at 08:52
  • 1
    @WalterTross: Thanks for pointing that. See my update 2. And it is also true that there is no need to an extra ``negetive lookahead``. If ``123product`` product has to match, then the pattern would be more simpler, just replacing the first ``character class`` with ``\w``. OP has never comment on this. – Cylian May 26 '12 at 09:12
  • Thanks @Cylian for explaining your answer. Now I understand the syntax. It should also now match `product.-name`. Regarding `123product` I didn't emphasize because I could have inserted it myself. The main thing that was giving me problems was to have alphanumerics/underscore at the beginning /end, plus multiple appearance of `dot` and `dash` in the middle without having any one of them appearing consecutively. Thanks a lot for the answer. I really appreciate your answer. :) – banskt May 26 '12 at 09:44
  • @banskt: You're welcome. Does it solve your problem? If need any update, let me know. – Cylian May 26 '12 at 09:45
1

This should do:

/^[A-z0-9_]([.-]?[A-Z0-9_]+)*[.-]?[A-z0-9_]$/

It will make sure that the word begins and ends with alphanumeric or underscore character. The bracket in the middle will make sure that there will be at most one period or dash in a row, followed by at least one alphanumeric or underscore character.

domvoyt
  • 416
  • 3
  • 6
0
/^[A-Z0-9_][A-Z0-9_.-]*[A-Z0-9_]$/i

This makes sure the first and last character is not a dash or period; the rest in between may consist of any character (within your chosen set).

Ja͢ck
  • 170,779
  • 38
  • 263
  • 309
0

The regex below will check for any string containing characters, numbers, dashes etc and and only one dot in the middle.

/^[A-Za-z0-9_-]+(\.){1}[A-Za-z0-9_-]+$/i

hope this helps

Kasia Gogolek
  • 3,374
  • 4
  • 33
  • 50