1

I need to parse a list of FullNames into First and Last Name. If a middle name is included, it should be included in the fist name field.

John Smith would be:

FirstName = John

LastName = Smith

John J. Smith would be:

FirstName = John J.

LastName = Smith

The issue is the names might be either Thai or English character set. I need to properly parse either set. I have tried just about everything...

DECLARE @FullName NVARCHAR(MAX) = N'กล้วยไม้ สวามิวัศดุ์'
--DECLARE @FullName NVARCHAR(MAX) = N'Mark C. Wilson'

SELECT 
    LEN(@FullName) AS StringLength,
    LEN(@FullName) - LEN(REPLACE(@FullName,N' ', N'')),
    LEN(REPLACE(@FullName,N' ', N'')),
    @FullName AS FullName,
    REVERSE(@FullName) AS ReverseName, -- This is obviously no Reverse of the string
    CHARINDEX(N' ', REVERSE(@FullName)) AS LastSpaceLocation,
    CHARINDEX(N' ', @FullName) AS FirstSpaceLocation,
    LEN(@FullName) AS LenString,
    STUFF(@FullName, 1, CHARINDEX(N' ', @FullName), N'') as FirstName,
    RIGHT(@FullName, LEN(@FullName) - CHARINDEX(N' ', @FullName) + 1) as LastName,
    LEFT(@FullName, LEN(@FullName) - CHARINDEX(N' ', REVERSE(@FullName))) AS FirstName,
    STUFF(RIGHT(@FullName, CHARINDEX(N' ', REVERSE(@FullName))),1,1,N'') AS LastName,
    LEN(@FullName),
    REVERSE(@FullName),
    REVERSE(' '),
    LEN(@FullName) - CHARINDEX(reverse(' '), REVERSE(@FullName)) - LEN(' ') + 1

The REVERSE simply does not work when the Thai character set is used.

Community
  • 1
  • 1
Mark
  • 53
  • 6
  • This happens because reversing the Thai name screws up the combining characters, which get "attached" to the space, causing a simple lexicographical search to fail. You can "fix" this, to some extent, by forcing a binary collation (`SELECT CHARINDEX(N' ', REVERSE(N'กล้วยไม้ สวามิวัศดุ์') COLLATE Latin1_General_BIN2)`), but you may well run into other oddities. Once you start branching out to more languages (like Japanese, or other languages with name components in an order different from English, or no last names at all) it'll only get harder. Consider leaving names alone where possible. – Jeroen Mostert Jan 17 '19 at 15:04
  • You seem to have [already asked](https://stackoverflow.com/questions/53233664/sql-server-parsing-thai-language-full-name-to-first-last) this question, and the existing answer has demonstrated that your approach based on reversing strings is fundamentally flawed and should be revised. It is clear that you have improved the query since the time the original question was asked, but the question stays the same; so you may need editing the original Q instead of asking a new one. VTC/duplicate. – Be Brave Be Like Ukraine Jan 17 '19 at 15:50
  • 1
    Possible duplicate of [SQL Server Parsing Thai Language Full Name to First Last](https://stackoverflow.com/questions/53233664/sql-server-parsing-thai-language-full-name-to-first-last) – Be Brave Be Like Ukraine Jan 17 '19 at 15:50
  • I saw this and it was not a valid solution for my issue. Reverse() did not work properly with the character set. – Mark Jan 17 '19 at 20:19

2 Answers2

1

I can't read Thai (I'm not that bright), but perhaps this may help.

Here we are using a CROSS APPLY to "fix" the string, and then it is a small matter of PasrName() and Concat()

I should add, parsing names is a slippery slope. One needs to consider

  • Multi Word Last Names ie De la Cruz
  • Suffix ie. Richard R Cappelletti MD

Example

Declare @YourTable table (FullName nvarchar(100))
Insert Into @YourTable values
 ('John Smith')
,('John J. Smith')
,(N'กล้วยไม้ สวามิวัศดุ์')

Select A.*
      ,LastName  = replace(parsename(S,1),'|','.')
      ,FirstName = replace(concat(parsename(S,4),' '+parsename(S,3),' '+parsename(S,2)),'|','.')
 From  @YourTable A
 Cross Apply ( values (replace(replace(FullName,'.','|'),' ','.'))) B(S)

Returns

FullName          LastName    FirstName
John Smith        Smith       John
John J. Smith     Smith       John J.
กล้วยไม้ สวามิวัศดุ์    สวามิวัศดุ์     กล้วยไม้

EDIT 2008 Version

Select A.*
      ,LastName  = replace(parsename(S,1),'|','.')
      ,FirstName = replace( IsNull(parsename(S,4),'') + IsNull(' '+parsename(S,3),'') + IsNull(' '+parsename(S,2),''),'|','.')
 From  @YourTable A
 Cross Apply ( values (replace(replace(FullName,'.','|'),' ','.'))) B(S)
John Cappelletti
  • 79,615
  • 7
  • 44
  • 66
  • I apologize, but unfortunately this is in a SS 2008 database. I realize this is near end of life, but... Anyway, concat was introduced in SQL Server 2012. – Mark Jan 17 '19 at 15:25
  • @Mark: Fortunately `CONCAT` is just a convenience in this case to deal with `NULL`s. Concatenating the strings with `+` and wrapping the concatenated expressions with `ISNULL(..., '')` will do to downgrade it. – Jeroen Mostert Jan 17 '19 at 15:28
  • @Mark Seed EDIT 2008 Version – John Cappelletti Jan 17 '19 at 15:30
  • @Mark Just saying ... Extended Support for 2008 ends July 2019. Please tag your questions with 2008 to avoid assumptions. – John Cappelletti Jan 17 '19 at 15:31
1

I'm Thai and one thing I know is that Thai people don't do middle name.

Yingy
  • 74
  • 1
  • 5