0

I have a dataset like this:

Juan Corona Jane L Doe John Q. Public R S Fitzgerald

I need to clean this up so it's:

Juan Corona Jane L. Doe John Q. Public R. S. Fitzgerald

But since MySQL doesn't support regex search and replace I feel like I'm in a bit of a stumper.

Better to do with is in the database, but as a fallback I can do it in PREG_REPLACE, where I know it can be done.

Slam
  • 3,125
  • 1
  • 15
  • 24
  • 1
    Easier to do it before hitting the database as MySQL doesn't provide good APIs for this (it is very doable, but very ugly). – Aluan Haddad Mar 07 '18 at 16:57
  • It's already in the database. It's legacy data I've inherited. I could dump it and fix it with sed and reimport, or filter it with preg_replace before display. but was hoping for a mysql only solution. – Slam Mar 08 '18 at 17:43
  • You know... maybe there's a way to break the names with substring_index and concat a period if the part has a length of 1? – Slam Mar 08 '18 at 17:46
  • Some things are best done in app code. – Rick James Mar 12 '18 at 02:27

1 Answers1

2
SELECT TRIM(  -- remove sourrounding spaces
    REPLACE(  -- remove remaining double spaces
    REPLACE( 
    REPLACE( 
    REPLACE( 
    -- ...  
    REPLACE( 
    REPLACE( 
    REPLACE( 
        REPLACE( CONCAT( ' ' , 'W A Mozart', ' '), ' ', '  '), 
    ' A ', ' A.'),
    ' B ', ' B.'),
    ' C ', ' C.'),
    -- ...
    ' X ', ' X.'),
    ' Y ', ' Y.'),
    ' Z ', ' Z.'),
    '  ', ' ')  -- remove remaining double spaces
);

Ahem, that is a pragmatic solution ... Plus, you can easily specify the letters to be regarded as an abbreviation or not.

The sourrounding space padding helps to find single letters at the beginning and the end of the string.

The initial double space padding is required to make the functions inner workings match shared spaces between two letters.

Remaining helper spaces are removed by the two outermost function calls of TRIM and REPLACE.

Remember, the matching is case-sensitive.

nil_s
  • 32
  • 4
  • That's ugly but it would work! Along this lines, do the REPLACE functions need to be nested, or can they be in serial? Tracking that many nestings is painful. – Slam Mar 08 '18 at 17:41
  • The REPLACE calls must be nested, because otherwise you would have to state the input string n times and end up with n repeated, only slightly modified output strings. E.g. `W A. Mozart W A Mozart ... W. A Mozart` – nil_s May 04 '18 at 08:28