0

My Apologies, I have never gotten to grips with regex expressions.

I need to remove anything (from a product name, some of them have ampersands in them) that isnt alphanumeric from a string, excluding the spaces.

So far I have this (found from another stackoverflow post):

$productname = preg_replace("~[\W]~","",$product['name']);

Now this replaces everything that is alphanumeric with "". Which is fine, except I want to exclude spaces as It is currently removing all spaces. Also, i cannot find anywhere what the tilde (~) does in regex.

With regards to the spaces, I have seen that there is a negative lookahead, achieved by ?!, but how to incorporate that in the above I don't know.

Community
  • 1
  • 1
Chud37
  • 4,907
  • 13
  • 64
  • 116
  • the ~ is a delimiter for PCRE regex, http://www.php.net/manual/pt_BR/regexp.reference.delimiters.php – rray Nov 04 '13 at 14:40
  • 1
    `I have never gotten to grips with regex expressions.` Regex means "regular expressions", so you don't need to say regex expressions, because it's like saying: regular expressions expressions. Just a hint. :) – Rafael Barros Nov 04 '13 at 14:53

2 Answers2

6

Also, i cannot find anywhere what the tilde (~) does in regex.

Regular expressions in PHP are enclosed in set of delimiters, usually ~ but you can use any non-alphanumeric character except for a few which are mentioned in the documentation.

Now if you want to replace any non-alphanumeric excluding spaces then you can do:

~[^a-zA-Z0-9\s]+~
  • ^ inside the character class [] makes the character class match anything not mentioned inside of it like [^a] matches anything but a.
  • a-z match small letters.
  • A-Z match uppercase letters.
  • 0-9 match digits.
  • \s match space characters.
  • ^ makes the previous class match everything but what was mentioned.
  • + after the characters class, makes the character class match one or more characters which are not mentioned in the character class.

\W alone matches non-word characters, so what are word characters ?

Word characters are usually any letters (small, uppercase), digits or underscore _.

Word characters \w (with small w) are usually equal to this character class [a-zA-Z0-9_].

\W (big W) matches non-word characters so it matches anything but what \w matches including spaces, so it will remove the spaces also.

Ibrahim Najjar
  • 19,178
  • 4
  • 69
  • 95
0

You can do simple negation in class block

$productname = preg_replace("#[^A-Za-z0-9]+#","",$product['name']);

it will replace with "" everything what is not alphanumeric

Robert
  • 19,800
  • 5
  • 55
  • 85