With such a lot of rows the only hint I can give you is: Change the design. Each name part should live in a separate column...
The following will work, but I promise it will be slow...
--set up a test db
USE master;
GO
CREATE DATABASE shnugo;
GO
USE shnugo;
GO
--your table, I added an ID-column
create table tbl_pat_soundex
(
ID INT IDENTITY --needed to distinguish rows
,col_str varchar(max)
);
GO
--A function, which will return a blank-separated string as a alphabetically sorted list of distinct soundex values separated by /
: "Smith A Steve" comes back as /A000/S310/S530/
CREATE FUNCTION dbo.ComputeSoundex(@str VARCHAR(MAX))
RETURNS VARCHAR(MAX)
AS
BEGIN
DECLARE @tmpXML XML=CAST('<x>' + REPLACE((SELECT @str AS [*] FOR XML PATH('')),' ','</x><x>') + '</x>' AS XML);
RETURN (SELECT DISTINCT '/' + SOUNDEX(x.value('text()[1]','varchar(max)')) AS [se]
FROM @tmpXML.nodes('/x[text()]') A(x)
ORDER BY se
FOR XML PATH(''),TYPE).value('.','nvarchar(max)') + '/';
END
GO
--Add a column to store a computed soundex-chain permanently
ALTER TABLE tbl_pat_soundex ADD SortedSoundExPattern VARCHAR(MAX);
GO
--We need a trigger to maintain the computed soundex-chain on any insert or update
CREATE TRIGGER RefreshComputeSoundex ON tbl_pat_soundex
FOR INSERT,UPDATE
AS
BEGIN
UPDATE s SET SortedSoundExPattern=dbo.ComputeSoundex(i.col_str)
FROM tbl_pat_soundex s
INNER JOIN inserted i ON s.ID=i.ID;
END
GO
--test data
insert into tbl_pat_soundex(col_str) values
('Smith A Steve')
,('Steve A Smyth')
,('A Smeeth Stive')
,('Steve Smith A')
,('Smit Steve A')
,('Smit Steve') --no A
,('Smit A') --no Steve
,('Smit Smith Robert Peter A') --add noise
,('Shnugo'); --something else entirely
--check the intermediate result
SELECT *
FROM tbl_pat_soundex
/*
+----+---------------------------+-----------------------+
| ID | col_str | SortedSoundExPattern |
+----+---------------------------+-----------------------+
| 1 | Smith A Steve | /A000/S310/S530/ |
+----+---------------------------+-----------------------+
| 2 | Steve A Smyth | /A000/S310/S530/ |
+----+---------------------------+-----------------------+
| 3 | A Smeeth Stive | /A000/S310/S530/ |
+----+---------------------------+-----------------------+
| 4 | Steve Smith A | /A000/S310/S530/ |
+----+---------------------------+-----------------------+
| 5 | Smit Steve A | /A000/S310/S530/ |
+----+---------------------------+-----------------------+
| 6 | Smit Steve | /S310/S530/ |
+----+---------------------------+-----------------------+
| 7 | Smit A | /A000/S530/ |
+----+---------------------------+-----------------------+
| 8 | Smit Smith Robert Peter A | /A000/P360/R163/S530/ |
+----+---------------------------+-----------------------+
| 9 | Shnugo | /S520/ |
+----+---------------------------+-----------------------+
*/
--Now we can start to search:
DECLARE @StringToSearch VARCHAR(MAX)=' A Steve';
WITH SplittedSearchString AS
(
SELECT soundexCode.value('text()[1]','nvarchar(max)') AS SoundExCode
FROM (SELECT CAST('<x>' + REPLACE(dbo.ComputeSoundex(@StringToSearch),'/','</x><x>') + '</x>' AS XML)) A(x)
CROSS APPLY x.nodes('/x[text()]') B(soundexCode)
)
SELECT a.ID,col_str
FROM tbl_pat_soundex a
INNER JOIN SplittedSearchString s On SortedSoundExPattern LIKE '%/' + s.SoundExCode + '/%'
GROUP BY ID,col_str
HAVING COUNT(ID)=(SELECT COUNT(*) FROM SplittedSearchString)
ORDER BY ID
GO
--clean-up
USE master;
GO
DROP DATABASE shnugo;
Short explanation
This is how it works:
- The cte will use the same function to return a soundex-chain of alle the input's fragments
- The query will then
INNER JOIN
this with a LIKE
test --this will be sloooooow...
- The final check is, if the number of hits is the same as number of fragments.
And a final hint: If you want to search for an exact match, but you want to include different writings you can just directly compare the two strings. You might even place an index on the new column SortedSoundExPattern
. Due to the way of creation all kinds of "Steven A Smith", "Steeven a Smit" and even in differing order like "Smith Steven A" will produce exactly the same pattern.