0

I'm trying to make a function to verify names on PHP using Regex, I want the names to be able to carry infinite amount of spaces and ' and -, and to allow only capital characters after spaces but to allow capital and none capitals after - and '.. Also the total length should be of 50 characters and the name should end with a lowercase, note that the uppercases are A to Z plus those characters :

ÙÒÌÈÀÁÉÍßÓÚÝÂÊÎÔÛÃÑÕÄÅÆŒÇÐØËÏÖÜŸ

and the lower cases are a to z plus those characters :

éçàèàèìòùáéíóúýâêîôûãñõäëïöüÿåæœçðøß

each word (between a space , ' or - and another) should count at least 2 characters the name should also start with an uppercase and finish with a lower case and in words (between a space , ' or - and another) no uppercases but that of the beginning is allowed

Examples of acceptable names are :

Adam Klsld
Adam'odskdl
Adam'Ddlsl
Ùdam-ddkkdk
Addssd-Ddsdsd

I've been trying a lot but here's my last try that I still keep in my php file, the others I've deleted in the chaos of non-successful attempts (using mb_ereg function to match, so this is a posix-ere):

([A-ZÙÒÌÈÀÁÉÍßÓÚÝÂÊÎÔÛÃÑÕÄÅÆŒÇÐØËÏÖÜŸ][a-zéçàèàèìòùáéíóúýâêîôûãñõäëïöüÿåæœçðøß]+){1}((^[\'\-\s])[A-ZÙÒÌÈÀÁÉÍßÓÚÝÂÊÎÔÛÃÑÕÄÅÆŒÇÐØËÏÖÜŸ][a-zéçàèàèìòùáéíóúýâêîôûãñõäëïöüÿåæœçðøß]+)*

(this does not necessarily mean it's the best attempt but I though it may help and give an idea on how much of a dork am I)

hakre
  • 193,403
  • 52
  • 435
  • 836
  • 1
    Not the solution, but start with a variant with only A-Z for the letters. then the pattern is not already that large. after it works, it's easy to just add the other letters. And create yourself a test-script in which you can run your match function against a set of strings you already know the outcome for. That way you can more quickly test and therfore write your regex. – hakre May 04 '13 at 00:16
  • 1
    What about names like `Jim McSomething`? Camel case is definitely something that exists in not-too-obscure names. – Martin Ender May 04 '13 at 00:17
  • Which language uses these characters? – Casimir et Hippolyte May 04 '13 at 00:19
  • 2
    IMO best solution, don't validate names, just let it be. – elclanrs May 04 '13 at 00:19
  • 1
    Why are you validating names? BTW I've heard `!xobile` is a name too :| – Ejaz May 04 '13 at 00:20
  • Heard about Leonardo da Vinci? – chelmertz May 04 '13 at 00:22
  • So should I just let it be? us that what you're suggesting? They should really make a REST or SOAP validation method or whatever for names and stuff it will make our world much easier – Seeking Knowledge May 04 '13 at 00:24
  • Also, if you're that accurate about lower and upper case: `ß` appears in your upper case list, and although there is an (pretty much unused) upper case version, the character you have there is the lower case, and it will never ever start a German word/name (which is - as far as I know - the only language that still uses the character). – Martin Ender May 04 '13 at 00:25
  • 2
    Yeah, validating names is an uphill battle. There are simply waaay too many variables to account for. The African name @Ejay mentioned, !xobile, is a good example. If you insist on at least doing a cursory check of names (so that they are not just garbage) then look into [Unicode characters](http://www.regular-expressions.info/unicode.html#prop), i.e. `\pL` for Unicode alphabetic characters. – Sverri M. Olsen May 04 '13 at 00:27

3 Answers3

1

Is this Regex answering what you need to check ?

enter image description here

(You'll have to add the weird characters inside each brackets of course).

RelevantUsername
  • 1,270
  • 8
  • 14
  • I think this doesn't cover the rules about upper/lower case names following spaces, ' and -. – DougW May 04 '13 at 00:30
  • 1
    @SeekingKnowledge Read the comment under the screen, to get the correct Regex, replace all the `[A-Z]` by your `[A-ÙÒÌÈÀÁÉÍßÓÚÝÂÊblablabla]`. It will work even with the "Ù". It was for a readability reason – RelevantUsername May 04 '13 at 00:31
  • @DougW Might not work with the "long dash" or the comma, you'll have to add it in the middle bracket, here : `[-'\s]` . It is actually matching `Adam Test` and not `Adam test`. Not sure that's what he's looking for – RelevantUsername May 04 '13 at 00:34
  • It doesn't cover the different rules for characters following a space, vs those following a dash or apostrophe though. – DougW May 04 '13 at 00:34
  • Yeah I'm not entirely sure. He mentioned a number like total length of 50 characters and ends in a lowercase. Some of that should probably be handled outside the regex though. – DougW May 04 '13 at 00:51
1

I wouldn't exactly suggest you use this... but I think this does what you want?

^([A-ZÙÒÌÈÀÁÉÍßÓÚÝÂÊÎÔÛÃÑÕÄÅÆŒÇÐØËÏÖÜŸ][a-zéçàèàèìòùáéíóúýâêîôûãñõäëïöüÿåæœçðøß]+){1}((([\s])[A-ZÙÒÌÈÀÁÉÍßÓÚÝÂÊÎÔÛÃÑÕÄÅÆŒÇÐØËÏÖÜŸ][a-zéçàèàèìòùáéíóúýâêîôûãñõäëïöüÿåæœçðøß]+)|((['\-])([A-ZÙÒÌÈÀÁÉÍßÓÚÝÂÊÎÔÛÃÑÕÄÅÆŒÇÐØËÏÖÜŸ]|[a-zéçàèàèìòùáéíóúýâêîôûãñõäëïöüÿåæœçðøß])[a-zéçàèàèìòùáéíóúýâêîôûãñõäëïöüÿåæœçðøß]+))*$

Here it is in a non-code block so you can see how insane it is... think it strips some characters here though:

^([A-ZÙÒÌÈÀÁÉÍßÓÚÝÂÊÎÔÛÃÑÕÄÅÆŒÇÐØËÏÖÜŸ][a-zéçàèàèìòùáéíóúýâêîôûãñõäëïöüÿåæœçðøß]+){1}((([\s])[A-ZÙÒÌÈÀÁÉÍßÓÚÝÂÊÎÔÛÃÑÕÄÅÆŒÇÐØËÏÖÜŸ][a-zéçàèàèìòùáéíóúýâêîôûãñõäëïöüÿåæœçðøß]+)|((['-])([A-ZÙÒÌÈÀÁÉÍßÓÚÝÂÊÎÔÛÃÑÕÄÅÆŒÇÐØËÏÖÜŸ]|[a-zéçàèàèìòùáéíóúýâêîôûãñõäëïöüÿåæœçðøß])[a-zéçàèàèìòùáéíóúýâêîôûãñõäëïöüÿåæœçðøß]+))*$

DougW
  • 28,776
  • 18
  • 79
  • 107
0

You can use this to avoid accented characters issue:

$pattern = "~^[\p{Lu}ß]\p{Ll}*+(?>(?> [\p{Lu}ß]|['-]\p{L})\p{Ll}*+)*$~u";
if(preg_match($pattern, $name)) { ...

Or for a more specific set of characters:

$pattern = "~(?(DEFINE)(?<Up>[A-ZÙÒÌÈÀÁÉÍßÓÚÝÂÊÎÔÛÃÑÕÄÅÆŒÇÐØËÏÖÜŸ]))
             (?(DEFINE)(?<Lo>[a-zéçàèàèìòùáéíóúýâêîôûãñõäëïöüÿåæœçðøß]))
             ^\g<Up>\g<Lo>*+(?>(?>\h\g<Up>|['-]\g<Up>?+\g<Lo>)\g<Lo>*+)*+$~ux";

if (preg_match($pattern, $name, $matches)) { ...

or the same in a shorter way:

$pattern = "~(?(DEFINE)(?<Up>[A-ZÀ-ÖØ-ݟߌ]))
             (?(DEFINE)(?<Lo>[a-zà-öø-ýÿßœ]))
             ^\g<Up>\g<Lo>*+(?>(?>\h\g<Up>|['-]\g<Up>?+\g<Lo>)\g<Lo>*+)*+$~ux";
Casimir et Hippolyte
  • 88,009
  • 5
  • 94
  • 125