2

I have a list of IDs which are created in various third party applications systems and manually added to our system. I need to try and auto increment these IDs based on the largest number. The values are either entirely a number or any number of letters followed by any number of numbers.

For example:

Array ( [works_id] => MD001 [num] => 0 ) 
Array ( [works_id] => WX9834V [num] => 0 ) 
Array ( [works_id] => WK009 [num] => 0 ) 
Array ( [works_id] => W4KHA2 [num] => 0 ) 
Array ( [works_id] => MD001 [num] => 0 ) 
Array ( [works_id] => DE1234 [num] => 0 ) 
Array ( [works_id] => 99 [num] => 99 ) 
Array ( [works_id] => 100 [num] => 100 )

In the above example, I would need to return 'DE' and 1234 as 1234 is the largest number which matches the pattern (WX9834V does not match as it is LLNNNNL)

So far I have tried:

SELECT  works_id, CAST(works_id as UNSIGNED) as num 
FROM table 
WHERE (works_id REGEXP '^[a-zA-Z]+[0-9]' or works_id REGEXP '^[0-9]+$')

But this returns all rows and returns 0 for the number part unless it is only made up of numbers - how can I return only 'DE' and 1234 from the above?

GMB
  • 216,147
  • 25
  • 84
  • 135
bhttoan
  • 2,641
  • 5
  • 42
  • 71
  • Which version of MySQL are you using? – GMB Mar 11 '19 at 21:30
  • Ah yes, 5.7.23 - not using 8 as yet – bhttoan Mar 11 '19 at 21:34
  • I'm unclear on what exactly you need. You only want to match one of the IDs; the one that contains the largest number not followed by letters? What if your input contains `DE1234` and `DE1237`? Regex is really bad at doing math. – CAustin Mar 11 '19 at 21:35
  • I don't mind actually if it returns all 'valid' values (i.e. DE,1234 and DE,1237) and then I can use PHP to find the highest of the output - the issue is there are over 500k records in the table and I would rather get MySQL to do the heavy lifting rather than looping 500k times in PHP – bhttoan Mar 11 '19 at 21:39
  • What application language are you using? It will be easier there. – Rick James Mar 14 '19 at 14:56

1 Answers1

2

From the comments, I understant that your primary intent is to select the records that do match your format spec (possibly characters at the beginning of the string, then mandatory numbers until the end of string).

The problem with you current query is that the first regexp, '^[a-zA-Z]+[0-9]' is too permissive: it does allow non-numbers characters at the end of the field, and would be better written '^[a-zA-Z]+[0-9]+$'

Bottom line, the two regexes can be combined into one:

SELECT works_id 
FROM mytable
WHERE works_id REGEXP '^[a-zA-Z]*[0-9]+$'

The regexp means:

^           beginning of the string
[a-zA-Z]*   0 to N letters
[0-9]+      at least one digit
$           end of string

In this db fiddle with your test data, this returns:

| works_id |
| -------- |
| MD001    |
| WK009    |
| MD001    |
| 99       |
| 100      |

NB : in MySQL pre-8.0, splitting the string in order to find the max numerical pain is hard to do, since functions such as REGEXP_REPLACE are not available. It is probably easier to do this in your application (unless you have a very large numbers of matching records...). You can have a look at this post or this other one for solutions that mostly rely on MySQL functions.

GMB
  • 216,147
  • 25
  • 84
  • 135