-1

My client sent me name data as a Name string which includes the last, first, and middle names in a single entry. I need them split into LastName, FirstName, and MiddleName. I have found some scripts online, but they don't serve my purposes because they either (1) use a different format, or (2) don't handle edge cases very well. See the examples below:

  1. Nightingale, Florence -> Florence Nightingale
  2. Bond, James Bond -> James Bond Bond
  3. Abbott, Edwin A. -> Edwin A. Abbott

Can someone help me write a SQL Server script that splits a string into the various pieces I'm looking for?

Myles Baker
  • 3,600
  • 2
  • 19
  • 25
  • What have you tried so far? Where does it fail? The examples look pretty simple, everything right of the comma followed by everything left of the comma. – Hart CO Aug 05 '14 at 16:09
  • Names are far more complicated than that... especially if you're internationalized. A first name or last name can contain a space, for example. There can be honorifics or titles. Some cultures reverse the semantic meaning of first and last name. You're not going to find a single answer that works in all cases. For instance, a last name of "Van Damme" is not a last name="Van" and middle name of "Damme". – pmbAustin Aug 05 '14 at 18:06
  • Ok, so maybe this wasn't clear, but I answered my own question when I originally posted this. My solution worked for my problem and I couldn't find anything similar on the web, so I thought I would post it. – Myles Baker Aug 05 '14 at 18:14
  • I have to wait two days. – Myles Baker Aug 05 '14 at 21:23
  • 1
    @MylesBaker thanks for posting the solution ! :) – pty Mar 16 '20 at 18:39

5 Answers5

4

Please note the following:

  1. Always request normalized data to ensure the highest data quality. I tried to enumerate all possible cases for last, first, and middle name combinations but I'm sure I did not get all of them.
  2. My script requires the format: LastName@DELIMITER1@DELIMITER2FirstName@DELIMITER2MiddleName, but can be easily altered for other formats.
  3. This script does not separate tiles like Dr., or handle suffixes.
  4. Credit to MemKills for the idea of the test data set, which I expanded.

>

DECLARE @DELIMITER1 varchar(1), @DELIMITER2 varchar(1), @MAX_LENGTH int
SET @DELIMITER1 = ','
SET @DELIMITER2 = ' '
SET @MAX_LENGTH = 50

SELECT  [Name],
    SUBSTRING(Name,1,CHARINDEX(@DELIMITER1,Name) -1) AS LastName,                   -- Less one char for @DELIMITER1
    SUBSTRING(Name,CHARINDEX(@DELIMITER1,Name)+ 2,@MAX_LENGTH) AS FirstAndMiddle,   -- Plus two for @DELIMITER1 and @DELIMITER2
    CASE 
        -- Middle name follows two-name first names like Mary Ann 
        WHEN LEN(SUBSTRING(NAME, CHARINDEX(@DELIMITER1,Name)+ 2,@MAX_LENGTH)) - LEN(REPLACE(SUBSTRING(NAME, CHARINDEX(@DELIMITER1,Name)+ 2,@MAX_LENGTH), @DELIMITER2, '')) > 0
            THEN SUBSTRING(Name, LEN(Name) - CHARINDEX(@DELIMITER2, REVERSE(Name))+2, @MAX_LENGTH)
        ELSE NULL
    END AS MiddleName,

    CASE 
        -- Count the number of @DELIMITER2. Choose the string between the @DELIMITER1 and the final @DELIMITER2. 
        WHEN LEN(SUBSTRING(NAME, CHARINDEX(@DELIMITER1,Name)+ 2,@MAX_LENGTH)) - LEN(REPLACE(SUBSTRING(NAME, CHARINDEX(@DELIMITER1,Name)+ 2,@MAX_LENGTH), @DELIMITER2, '')) > 0
            Then SUBSTRING(Name, CHARINDEX(@DELIMITER1,Name)+ 2, 
                 (LEN(SUBSTRING(NAME, CHARINDEX(@DELIMITER1,Name)+ 2,@MAX_LENGTH))
                 - LEN(SUBSTRING(Name, LEN(Name) - CHARINDEX(@DELIMITER2, REVERSE(Name))+2, @MAX_LENGTH))))
        ELSE SUBSTRING(Name,CHARINDEX(@DELIMITER1,Name)+ 2,@MAX_LENGTH)
    END AS FirstName
FROM 
(
    SELECT  [Name] = 'Zzz, A' UNION ALL
    SELECT  'de Zzz, Aaa' UNION ALL
    SELECT  'Zzz, Aaaa' UNION ALL
    SELECT  'Zzz, A B' UNION ALL
    SELECT  'Zzz, Aaaa Bbbb' UNION ALL
    SELECT  'de Zzz, Aaaa' UNION ALL
    SELECT  'de Zzz, Aaaa B' UNION ALL
    SELECT  'van Zzz, Aaaa B' UNION ALL
    SELECT  'Yyy-Zzz, Aaaa B' UNION ALL
    SELECT  'd''Zzz, Aaaa B' UNION ALL
    SELECT  'Zzz, Aaaa Bbbb C' UNION ALL
    SELECT  'Zzz, Aaaa Bbbb Cccc'
) AS X
Myles Baker
  • 3,600
  • 2
  • 19
  • 25
1

Try this code out. I find it to be a bit more efficient. Please feel free to modify or improve on it. Thanks.


DECLARE @FullName VARCHAR(60),
        @FirstName VARCHAR(30),
        @LastName VARCHAR(30),

        @MiddleInitialPrep VARCHAR(60) = null,
        @MiddleInitial VARCHAR(1) = null

SET @FullName = 'Dr. John Edward Doe III'

-- NAME CLEAN UP TO REMOVE PREFIXES AND SUFFIXES
SET @FullName = REPLACE(@FullName, 'Mr. ', '')
SET @FullName = REPLACE(@FullName, 'Mr ', '')
SET @FullName = REPLACE(@FullName, 'Mrs. ', '')
SET @FullName = REPLACE(@FullName, 'Mrs ', '')
SET @FullName = REPLACE(@FullName, 'Ms. ', '')
SET @FullName = REPLACE(@FullName, 'Ms ', '')
SET @FullName = REPLACE(@FullName, 'Miss ', '')
SET @FullName = REPLACE(@FullName, 'Dr. ', '')
SET @FullName = REPLACE(@FullName, 'Dr ', '')
SET @FullName = REPLACE(@FullName, ' Jr.', '')
SET @FullName = REPLACE(@FullName, ' Jr', '')
SET @FullName = REPLACE(@FullName, ' Sr.', '')
SET @FullName = REPLACE(@FullName, ' Sr', '')
SET @FullName = REPLACE(@FullName, ' III', '')
SET @FullName = REPLACE(@FullName, ' II', '')

-- RETRIEVE FIRST AND LAST NAMES
SET @FirstName = LEFT(@FullName, NULLIF(CHARINDEX(' ', @FullName) - 1, -1))
SET @LastName = RIGHT(@FullName, ISNULL(NULLIF(CHARINDEX(' ', REVERSE(@FullName)) - 1, -1), LEN(@FullName)))

-- ISOLATE MIDDLE INITIAL
SET @MiddleInitialPrep = REPLACE(@FullName, @FirstName, '')
SET @MiddleInitialPrep = REPLACE(@MiddleInitialPrep, @LastName, '')
SET @MiddleInitial = REPLACE(@MiddleInitialPrep, ' ', '')

SELECT @FirstName First_Name, @MiddleInitial Middle_Initial, @LastName Last_Name
  • I think the "true" answer to the question is to get the data delivered to you delimited from the source. Transformations on this data could have some unforeseen consequences. – Myles Baker Nov 04 '14 at 17:29
1

The code below works with Last, First M name strings. Substitute "Name" with your name string column name. Since you have a period as a final character when there is a middle initial, you would replace the 2's with 3's in each of the lines (2, 6, and 8)- and change "RIGHT(Name, 1)" to "RIGHT(Name, 2)" in line 8.

SELECT  SUBSTRING(Name, 1, CHARINDEX(',', Name) - 1) LastName ,
    CASE WHEN LEFT(RIGHT(Name, 2), 1) <> ' '
         THEN LTRIM(SUBSTRING(Name, CHARINDEX(',', Name) + 1, 99))
         ELSE LEFT(LTRIM(SUBSTRING(Name, CHARINDEX(',', Name) + 1, 99)),
                   LEN(LTRIM(SUBSTRING(Name, CHARINDEX(',', Name) + 1, 99)))
                   - 2)
    END FirstName ,
    CASE WHEN LEFT(RIGHT(Name, 2), 1) = ' ' THEN RIGHT(Name, 1)
         ELSE NULL
    END MiddleName
Frank_C
  • 31
  • 1
0

Great solution. I made a couple modifications to work for my case where the delimiters are spaces and the middle name is just the middle initial (that sometimes is not present). This following solution parsed even multi spaced names like for example: "Jo Ann Taylor Haynes" without a middle initial.

SET @DELIMITER1 = ' '
SET @DELIMITER2 = ' '
SET @MAX_LENGTH = 50

SELECT  [Name],
    SUBSTRING(Name,1,CHARINDEX(@DELIMITER1,Name) -1) AS LastName,                  

    SUBSTRING(Name,CHARINDEX(@DELIMITER1,Name)+ 1,@MAX_LENGTH) AS FirstAndMiddle,   
    CASE 

        WHEN LEN(SUBSTRING(NAME, CHARINDEX(@DELIMITER1,Name)+ 1,@MAX_LENGTH)) - LEN(REPLACE(SUBSTRING(NAME, CHARINDEX(@DELIMITER1,Name)+ 1,@MAX_LENGTH), @DELIMITER2, '')) = 1
            THEN SUBSTRING(Name, LEN(Name) - CHARINDEX(@DELIMITER2, REVERSE(Name))+1, @MAX_LENGTH)
        ELSE NULL
    END AS MiddleName,

    CASE 

        WHEN LEN(SUBSTRING(NAME, CHARINDEX(@DELIMITER1,Name)+ 1,@MAX_LENGTH)) - LEN(REPLACE(SUBSTRING(NAME, CHARINDEX(@DELIMITER1,Name)+ 1,@MAX_LENGTH), @DELIMITER2, '')) = 1
            Then SUBSTRING(Name, CHARINDEX(@DELIMITER1,Name)+ 1, 
                 (LEN(SUBSTRING(NAME, CHARINDEX(@DELIMITER1,Name)+ 1,@MAX_LENGTH))
                 - LEN(SUBSTRING(Name, LEN(Name) - CHARINDEX(@DELIMITER2, REVERSE(Name))+1, @MAX_LENGTH))))
        ELSE SUBSTRING(Name,CHARINDEX(@DELIMITER1,Name)+ 1,@MAX_LENGTH)
    END AS FirstName
-4
select substr(
'santhosh kumar kota'
,1,
instr(
'santhosh kumar kota'
,' ' 
,1
,1)
) as fname
,substr('santhosh kumar kota'
,instr(
'santhosh kumar kota'
,' ' 
,1
,1)
,(instr(
'santhosh kumar kota'
,' ' 
,1
,2)-instr(
'santhosh kumar kota'
,' ' 
,1
,1)
)
)as mname
,substr('santhosh kumar kota'
,instr(
'santhosh kumar kota'
,' ' 
,1
,2)
,(length('santhosh kumar kota')+1)-instr(
'santhosh kumar kota'
,' ' 
,1
,2)
)as lname
from dual
/
rene
  • 41,474
  • 78
  • 114
  • 152