I have a large table of 30 million records that contains a free-form text field which may contain names in any position and with any salutation, or not salutation at all.
My job is to mask out the names with Xxxxx Xxxxx to preserve privacy.
I have access to a large surnames database that defines for me what constitutes a name.
Using SQL Server 2012, what is the most efficient technique I can use for this task?
EDIT
Okay, I've got something working pretty decently that involves a Full-Text index/search, the names database, and a stored procedure.
However, I've run into a rather peculiar problem. I'm using a CONTAINS predicate (CONTAINS([textvaluefield], @namestring) where SET @namestring = 'NEAR((Dr.,'+@name+'), 1, TRUE)'.
This works perfectly except when the salutation in the [textvaluefield] is "DR." instead of "Dr.", i.e "DR. Johnson" is not getting picked up, yet "Dr. Johnson" is. I've verified this because if I change the value in the [textvaluefield] of a record from "DR." to "Dr.", yet leave everything else the same, that record will suddenly get picked up. If I revert the record to use "DR.", it will not get picked up again.
What make this bizarre is that I'm definitely using a case insensitive collation (Latin1_General_CI_AS). Anyone have any ideas?