3

I'm trying to see if a string matches my country's phone number format, which is the area code (two digits that may or may not be preceded by a 0 and might also be between parenthesis) followed by 8 or 9 digits in which there may be an dash character before the 4 last digits. These are some valid formats:


'00 00000000'
'000-000000000'
'000 00000-0000'
'00 0000-0000'
'(00) 0000-0000'
'(000) 000000000'

So far this is the working expression I have:


p = /0?\d{2}\s?-?\s?\d{4,5}\s?-?\s?\d{4}/

I tried to use a conditional to see if the area code is inside parenthesis with /?(\() 0?\d{2}\)|0?\d{2} \s?-?\s?\d{4,5}\s?-?\s?\d{4}/ but got the (repl):1: target of repeat operator is not specified: /?(\() 0?\d{2}\)|0?\d{2} \s?-?\s?\d{4,5}\s?-?\s?\d{4} error.

What am I doing wrong here?

Emma
  • 27,428
  • 11
  • 44
  • 69
aplneto
  • 127
  • 2
  • 10

5 Answers5

4

Do not validate phone numbers with regular expressions. I bet you do not want to rule out those occasionally typing 2 consequent spaces or something.

Instead, filter out all the non-digits and leading zeroes, and then validate. Like this:

number.gsub(/\D+/, '').gsub(/\A0+/) =~ /\d{8,9}/

I am not sure it would suit your needs out of the box, but I bet you’ve got the point. After all, [000]1234 56789 is an understandable phone number.

Aleksei Matiushkin
  • 119,336
  • 10
  • 100
  • 160
  • Thank you! That's a great idead, I'll sure do that. – aplneto May 20 '19 at 05:00
  • 2
    mudsie, I believe you meant `.gsub(/\A0+/, '')`. This does not tell us whether a string represents a valid phone number, but isn't that the idea? For example, `"Something like 99.473218% of Rubiests use snake case for names of variables and methods.".gsub(/\D+/, '').gsub(/\A0+/, '') =~ /\d{8,9}/ #=> 0`. I confirmed that by phoning that number and heard no ringtone. – Cary Swoveland May 20 '19 at 17:38
  • Doesn't this also not match the requirements because only the single leading 0 should be ignored? – Max May 20 '19 at 19:43
  • If the requirement is to scary users, or to play regex golf, then yes, the above does not work. If the goal is to validate the user’s input, than it is the only way to go. Because when they reject the number I entered, because they have a sophisticated regex there, you know, I close the page. Forever. After all, 555- numbers would pass, while they are surely invalid. – Aleksei Matiushkin May 20 '19 at 19:50
  • But if it accepts invalid input and then they can't call you because they stored an invalid number... that's also annoying – Max May 20 '19 at 19:51
  • The chance I would occasionally swap two digits is likely 1M times more probable than I would enter the Shakespeare's sonnet there. – Aleksei Matiushkin May 20 '19 at 19:54
  • The same applies for emails, addresses etc. I was living at Street, 1bis, building 1/2-3, app. 5-5 (do not ask) and guess how many food delivery forms accepted that. – Aleksei Matiushkin May 20 '19 at 19:57
  • @AlekseiMatiushkin I agree. With that perspective I would just strip non-digits and verify it's non-empty. Stripping leading zeros is possibly destructive. – Max May 20 '19 at 22:30
3

My answer addresses your conditional idea for the optional parenthesis.
Ruby supports conditionals since v2.0. The syntax is (?(A)X|Y): If A is true, X else Y.

  • Put an optional capturing group containing an opening parenthesis at start:
    ^(\()?
  • Later anywhere in the pattern check if it succeeded:
    (?(1)\) |[ -])
    If success: require a closing ) followed by space | else: [ -] space or dash.

So the whole pattern with conditional could be

^(\()?0?\d{2}(?(1)\) |[ -])\d{4,5}[ -]?\d{4}$

See the demo at Rubular or Regex101. Adjust further to your needs.

An alternative to use alternation (?:\(abc\)|abc) which @CarySwoveland answered already but @AlekseiMatiushkin's answer surely will make life easier I think.

bobble bubble
  • 16,888
  • 3
  • 27
  • 46
  • 1
    The conditional expression is new to me and certainly a worthwhile addition to my toolbelt. Considering that it is so potentially useful, and that Ruby v2.0 has been out for so long, I'm surprised that I've not seen it used before here at SO. – Cary Swoveland May 20 '19 at 17:59
2

There might be several ways to validate these numbers. One way would be that, we write all our possible phone numbers, then write an expression for it. Maybe, similar to:

[0-9]{2,3}(\s|-)[0-9]{4,5}-?[0-9]{3,4}

Test

re = /[0-9]{2,3}(\s|-)[0-9]{4,5}-?[0-9]{3,4}/m
str = '\'00 00000000\'
\'000-000000000\'
\'000 00000-0000\'
\'00 0000-0000\''

# Print the match result
str.scan(re) do |match|
    puts match.to_s
end

Demo

This snippet is just to show the capturing groups and that the expression might be valid:

const regex = /[0-9]{2,3}(\s|-)[0-9]{4,5}-?[0-9]{3,4}/gm;
const str = `'00 00000000'
'000-000000000'
'000 00000-0000'
'00 0000-0000'`;
let m;

while ((m = regex.exec(str)) !== null) {
    // This is necessary to avoid infinite loops with zero-width matches
    if (m.index === regex.lastIndex) {
        regex.lastIndex++;
    }
    
    // The result can be accessed through the `m`-variable.
    m.forEach((match, groupIndex) => {
        console.log(`Found match, group ${groupIndex}: ${match}`);
    });
}

RegEx

If this expression wasn't desired, it can be modified or changed in regex101.com.

enter image description here

RegEx Circuit

jex.im also helps to visualize the expressions.

enter image description here


Edit 1:

In case of (), we want to add two negative lookbehind to our initial expression. Maybe, similar to this:

\(?[0-9]{2,3}\)?(\s|-)[0-9]{4,5}-?[0-9]{3,4}

enter image description here

Emma
  • 27,428
  • 11
  • 44
  • 69
  • 1
    Leading `[0-9]{2,3}` would match `999` which must be ruled out according to OP. – Aleksei Matiushkin May 20 '19 at 04:42
  • 1
    Yes, but if I put `0?\d{2}` it should do it, right? – aplneto May 20 '19 at 04:47
  • 1
    I just added more examples using parenthesis – aplneto May 20 '19 at 04:48
  • No, the question has not been changed. Examples were added, but the statement itself stays as it was. – Aleksei Matiushkin May 20 '19 at 04:49
  • @aplneto should do exactly what? This is not any better than your original approach and it has nothing to do with the real question about parentheses. – Aleksei Matiushkin May 20 '19 at 04:50
  • 1
    Hi @Emma, it's almos this but it can't match a string with just open or closing parenthesis like `(00 0000-0000`. Also the area code must be an optional '0' followed by two digits. – aplneto May 20 '19 at 04:57
  • @AlekseiMatiushkin, I was talking about the area code format. That this `0?\d{2}` should match the format I explained right? – aplneto May 20 '19 at 04:58
  • 1
    @Emma It is, but this format should not be valid. What I want to validade in the expression is if there is an opening parenthesis before the area code, there should also be a closing one. Like `(0000000000` should not be valid, but `0000000000` and `(00)000000000` should. – aplneto May 20 '19 at 05:06
  • `'(000) 000000000'.match? re #=> false` and `'(000 000000000'.match? re #=> true`. Unfortunately, both are incorrect. – Cary Swoveland May 20 '19 at 07:11
2

I believe you can use the following regular expression.

R = /
    \A            # match beginning of string
    (?:           # begin a non-capture group
      \(0?\d{2}\) # match '(' then an optional `0` then two digits then ')'
    |             # or
      0?\d{2}     # match an optional `0` then two digits
    )             # end the non-capture group
    (?:           # begin a non-capture group
      [ ]+        # match one or more spaces
    |             # or
      -           # match a hyphen
    )             # end the non-capture group
    \d{4,5}       # match 4 or 5 digits
    -?            # optionally match a hyphen
    \d{4}         # match 4 digits
    \z            # match end of string
    /x            # free-spacing regex definition mode

arr = [
  '00 00000000',
  '000-000000000',
  '000 00000-0000',
  '00 0000-0000',
  '(00) 0000-0000',
  '(000) 000000000',
  '(000 000000000',
  '(0000) 000000000'
]

arr.map { |s| s.match? R }
  #=> [true, true, true, true, true, true, false, false]

The regex is conventionally written as follows.

R = /\A(?:\(0?\d{2}\)|0?\d{2})(?: +|-)\d{4,5}-?\d{4}\z/

This should be changed as follows if the leading digits cannot equal zero. (If, for example, '001-123456789' and '(12)-023456789' are invalid.)

R = /\A(?:\(0?[1-9]\d\)|0?\[1-9]\d)(?: +|-)[1-9]\d{3,4}-?\d{4}\z/
Cary Swoveland
  • 106,649
  • 6
  • 63
  • 100
0

Don't do this unless you know you are working in a very, very limited scope e.g.

  • The numbers are being passed to a system that only accepts a specific format so you know that these exact formats and no others will work
  • The numbers are just read by a human, so you can let them figure it out and don't have to validate anything

Otherwise you should use a robust library like https://github.com/mobi/telephone_number (inspired by Google's libphonenumber)

Max
  • 21,123
  • 5
  • 49
  • 71