21

I need a special regular expression and have no experience in them whatsoever, so I am turning to you guys on this one.

I need to validate a classifieds title field so it doesn't have any special characters in it, almost.

Only letters and numbers should be allowed, and also the three Swedish letters å, ä, ö (upper- or lowercase).

Besides the above, these should also be allowed:

  • The "&" sign.
  • Parentheses "()"
  • Mathematical signs "-", "+", "%", "/", "*"
  • Dollar and Euro signs
  • One accent signed letter: "é". // Only this one is required
  • Double quote and single quote signs.
  • The comma "," and point "." signs
Greg Bacon
  • 134,834
  • 32
  • 188
  • 245
  • 3
    FYI, the "accent sign" is not a separate character. Much as å and a are different characters, so are é and e (and ú and u, etc.), at least as far as computers are concerned. – David Gelhar May 07 '10 at 13:13
  • Then only the é is required, thanks –  May 07 '10 at 13:14
  • 1
    What encoding are you expecting the string to be in? (obviously not ascii, utf-8?) Why are you removing the characters? It appears like html escaping the string would be a better solution than regex matching each bad character and removing it. – marr75 May 07 '10 at 13:15
  • If this is not a language specific question, please remove the language tags. – Gumbo May 07 '10 at 13:18

4 Answers4

37

Try this:

^[\s\da-zA-ZåäöÅÄÖ&()+%/*$€é,.'"-]*$

Breakdown:

^ = matches the start of the string

[...]* = matches any characters (or ranges) inside the brackets one or more times

$ = matches the end of the string

Updated with all the suggestions from the comments. Thanks guys!

Chris Van Opstal
  • 36,423
  • 9
  • 73
  • 90
  • Uppercase versions of åäöé too? – David Gelhar May 07 '10 at 13:16
  • what about capital å,ä,ö,é ? perhaps the /i modifier? – Dal Hundal May 07 '10 at 13:17
  • 1
    +1 Answered the question, but I think his design of sanitizing the string before inserting in html with regex is fragile, an html escape function would be more maintainable and sane. As far as upper case versions of the letters, I think Chris is just giving an example of how to include them, Camran can add all the letters he wants. – marr75 May 07 '10 at 13:17
  • 1
    Just add `ÅÄÖ` in there and you're done :) Oh and if you want to be thorough, explain the regex a bit, mainly what do ^, $, * and brackets mean. – Esko May 07 '10 at 13:18
  • @marr75, its not neccesarily just to sanitize before inserting into HTML, he may want to just disallow titles with other characters in as they could indicate that the title is either garbage or just plain un-neat. You'd still want to sanitize the results of doing this regex afterwards though. – Dal Hundal May 07 '10 at 13:20
  • 2
    Careful, the `-` is wrong. Needs to be at the end of the character class. And most of the backslashes are unnecessary. – Tim Pietzcker May 07 '10 at 13:21
  • Could someone edit it so that the minus sign works also? Thanks guys! –  May 07 '10 at 13:27
  • shouldn't it be "^[\s\da-zA-Z0-9åäöÅÄÖ&()+%/*$€é,.'"-]*$" ? – Rahim Khoja Mar 18 '17 at 07:43
7

There is a security flaw in the accepted answer:

^[\s\da-zA-ZåäöÅÄÖ&()+%/*$€é,.'"-]*$

This will generate a true response for empty strings as * is for 0 or more occurrences.

Here is a more secure version:

^[\s\da-zA-ZåäöÅÄÖ&()+%/*$€é,.'"-]+$

The + responds true to 1 or more occurrences.

More information can be found at https://regexr.com/

Kae Cyphet
  • 185
  • 2
  • 6
1

PHP has a variety of functions that can help with text validation. You may find them more appropriate than a straight regex. Consider strip_tags(), htmlspecialchars(), htmlentities()

As well, if you are running >PHP5.2, you can use the excellent Filter functions, which were designed for exactly your situation.

dnagirl
  • 20,196
  • 13
  • 80
  • 123
0
^[\sa-zA-Z0-9åäö&()+%/*$€é,.'"-]*$

will match all the required characters.

In PHP:

if (preg_match('#^[\sa-zA-Z0-9åäö&()+%/*$€é,.\'"-]*$#i', $subject)) {
 # Successful match
} else {
 # Match attempt failed
}
Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561