1

I'm trying to cleanse a data set from erroneous phone number entries. Having trouble making the regular expression for the filter in MySQL.

The structure is the following:
First digit is in 2-9
Second and third digits can be any numeral except they may not be the same number
Forth digit is in 2-9
Fifth and sixth digits can be any numeral except '11'

I've landed on a few rather elaborate reg expressions which didn't quite work; but I'm sure there is a simplistic approach.

A "valid" number might look like:
2028658680
7137038891

My filter usually misses cases such as:
6778914351
7777777777
6178116678

Note that these numbers are completely made up.

jameselmore
  • 442
  • 9
  • 20
  • can you provide samples which match your cases – marvel308 Aug 15 '17 at 19:09
  • Added some fabricated examples – jameselmore Aug 15 '17 at 19:14
  • 1
    You can't meet these criteria with a regular expression. Specifically, the part where some characters' requirements rely on others. – Dan Fego Aug 15 '17 at 19:17
  • `[2-9](\d)(?!\1)\d[2-9](?!11)\d{2}` –  Aug 15 '17 at 19:18
  • 2
    You are stuck here, because MySQL does not support *backreferences* and thus your second condition cannot be coded with the MySQL regex. Well, it does not support lookarounds either, and that makes it a bit hard to implement the last condition. No way to solve the issue with a MySQL regex. – Wiktor Stribiżew Aug 15 '17 at 19:32

1 Answers1

2

This is possible, but it will be long and ugly. With a more robust regex engine you can do lookaround and even conditional statements, but MySQL doesn't support such things as far as I know.

^[2-9](?:0[1-9]|1[02-9]|2[013-9]|3[0-24-9]|4[0-35-9]|5[0-46-9]|6[0-57-9]|7[0-689]|8[0-79]|9[0-8])[2-9](?:1[02-9]|[02-9]1|[02-9]{2})[0-9]{4}$

https://regex101.com/r/qPuS5W/1

Explanation:

[2-9] First digit is any number from 2 to 9.

(?:0[1-9]|1[02-9]|2[013-9]|3[0-24-9]|4[0-35-9]|5[0-46-9]|6[0-57-9]|7[0-689]|8[0-79]|9[0-8]) Non capturing group that contains 10 alternatives starting with each number 0 to 9 followed by any number except that number.

(?:1[02-9]|[02-9]1|[02-9]{2}) Non capturing group that matches either 1 followed by a number that isn't 1, a number that isn't 1 followed by 1, or two numbers that aren't 1.

[0-9]{4} 4 of any number.

CAustin
  • 4,525
  • 13
  • 25