How can I optimize/refactor a TSQL "LIKE" clause?

Question

I have a table with 117000 or so records. I need to perform a search that checks 3 separate fields for a given string pattern.

My where clause is as follows:

field1 LIKE '%' + @DESC + '%'
OR field2 LIKE '%' + @DESC + '%'
OR field3 LIKE '%' + @DESC + '%'

This seems to take about 24 seconds regardless of input...

Is there a better way to do this? Less than 10 (or 5!) seconds would be much more preferable.

Thanks for any help.

score 13 · Accepted Answer · answered Oct 14 '09 at 21:09

13

Use Full Text Search and CONTAINS. LIKE cannot be optimized when searching in the middle of the field, ie. when the LIKE expression starts with an '%', so it will always do a full table scan.

answered Oct 14 '09 at 21:09

Remus Rusanu

288,378
40
442
569

you can use an index, with a little work, when doing LIKE '%'+string, see the link in my other comment. – KM. Oct 14 '09 at 21:19
@KM: Interesting trick, a reversed column. Also good analogy with the phone book, makes the case vividly clear. – Remus Rusanu Oct 14 '09 at 21:24
nice trick , to obad there isn't anything similar for the like '%text%' problem. – HLGEM Oct 14 '09 at 21:38
@KM: It definitely has to be a '%test%' form, so the reverse method is out, though I have seen that before and it's a clever trick. I will have to try the Full Text Search, but it looks like I'm going to have to convince the DBAs to index it properly. Thank you for the suggestion. I will see if Full Text/Contains boosts performance. I recall doing this on the old database with the exact same query and it was much faster...not sure what the new one is doing differently – IronicMuffin Oct 15 '09 at 15:55
After getting the DBA's to Full Text Index the tables, this brought me down to 1-3 seconds for a search. Thanks for the help! – IronicMuffin Oct 21 '09 at 19:23
how about searching at the beginning of the string? – user384080 Nov 04 '10 at 03:24
What if the parameter passed is null? – Doug Chamberlain Aug 06 '13 at 13:57
This is incorrect, LIKE can be optimized even when searching in the middle of the field. http://www.sqlservercentral.com/blogs/dwainsql/2014/03/26/getting-an-index-seek-to-speed-up-like-string-searches/ – Mr. TA Mar 16 '18 at 14:23
@Mr.TA narrowing a rowset for purpose of smaller fullscan is not an *index optimization*. Since you are insisting on reviving a 9 years old answer, you could at least look at a modern alternative like columnstore... – Remus Rusanu Mar 16 '18 at 16:36
@RemusRusanu 1, I did not say `LIKE` is the only, or the preferred, solution. 2, did you read the page I linked? It shows how to use an index, not how to narrow a rowset, whatever that means. – Mr. TA Mar 16 '18 at 17:00

score 3 · Answer 2 · answered Oct 14 '09 at 21:09

3

Anytime you start a LIKE search with a wildcard, you're doing a scan. Unless you can narrow your search criteria to include the first character (which may not be feasible), you'll need to resort to Full Text Search.

answered Oct 14 '09 at 21:09

Stuart Ainsworth

12,792
41
46

4

@Stuart Ainsworth said _Anytime you start a LIKE search with a wildcard, you're doing a scan_, that soes not have to be true, see this: http://stackoverflow.com/questions/1388059/sql-server-index-columns-used-in-like/1395881#1395881 – KM. Oct 14 '09 at 21:15
Now you're just playing with semantics; you REVERSE the column and index it, and then reverse the like clause so it doesn't start with a wildcard. You're not starting a search with a wildcard, so therefore you're not doing a scan. I will admit that is an interesting solution, and I'll have to keep it in mind. – Stuart Ainsworth Oct 14 '09 at 23:33
This is incorrect, LIKE can be optimized even when searching in the middle of the field. http://www.sqlservercentral.com/blogs/dwainsql/2014/03/26/getting-an-index-seek-to-speed-up-like-string-searches/ – Mr. TA Mar 16 '18 at 14:25
1

So nearly 8 years later, I learned something new :) Never thought about removing the NULL values from a wildcard string search. – Stuart Ainsworth Mar 17 '18 at 15:19

score 1 · Answer 3 · answered Oct 14 '09 at 21:41

Do you really need to start with a wildcard? Why? Often you can force users to type in the first character at least. I bring this up becasue some developers just use the wildcard as a habit not becasue there is a requirement. In most cases users will be able to type the first character unless the filed stores long strings (like say official airport names). Otherwise you really need to use full-text indexing although KM's trick with the reverse is pretty cool if you don't need the wildcard at the end.

If you can avoid doing the performance killing things, then do so.

Yes, I do need a double sided wildcard for this query. I need to find something like 'banana' in 'Strawberry Banana Yogurt.' — IronicMuffin, Oct 15 '09 at 15:54

score 1 · Answer 4 · answered Sep 01 '15 at 10:29

I tried one possible solution. Before this solution even query was not returning result and causing connection timeout error.

My query was having date filter and other criteria. All other criteria was like search. One column keyword was searching like '%abc%' on ntext column and it was doing full table scan.

Solution:

Divide query in 2 parts. 1) First part in CTE (Common Table Express) 2) Apply all search criteria on CTE.

WITH SearchData(Column1,Column2,Column3,Column4,........)
    AS
    (
    SELECT Column1,Column2,Column3,Column4,...........
    FROM myTable1 WITH(NOLOCK) 
            INNER JOIN MyTable2 WITH(NOLOCK) 
                ON MyTable1.id = MyTable2.Id
    WHERE (MyTable1.CreationTime >= '2014-04-27' AND MyTable1.CreationTime <= '2014-05-01') 
 )

    SELECT DISTINCT top 250 Column1,Column2,Column3,Column4
    FROM SearchData
    WHERE   (ISNULL(Column1,'') LIKE @Column1  +'%' OR @Column1 IS NULL)
            and (Column2 LIKE @Column2+ '%' OR @Column2 IS NULL)
            ...
            ...
            ...
            ...
            AND (Column10 like '%'+@Column10+'%' or @Column10 IS NULL)
            AND @Column1+@Column2+@Column3+........@Column10 <> ''  
            ORDER BY [CreationTime] DESC

It worked for me.

I think using `TOP` clause with a query that has a LIKE condition will always up the performance. So its the TOP clause rather than a CTE that contributes to performance gain, since with TOP a full scan is cut short as soon as the required number of records is scanned. — Sunil, Oct 18 '15 at 20:03
as a off-topic note using WITH(NOLOCK) in every join is very dangerous you must be very careful with it or you will get duplicated data etc... I see people trowing this hint as a standard way of optimizing queries, its not. you should always think twice and review the way tables are written (inserted/update) in your application before trowing in a nolock hint. (just saying) — Pedro Figueiredo, Dec 18 '15 at 01:21

score 1 · Answer 5 · answered Sep 19 '10 at 02:29

Whilst I agree with the accepted answer that Full Text Indexing would be the best solution and am by no means advocating the use of leading wildcard searches if they have to be performed then there are potential steps that can be taken to make the performance of them less bad.

Kalen Delaney in the book "Microsoft SQL Server 2008 Internals" says:

Collation can make a huge difference when SQL Server has to look at almost all characters in the strings. For instance, look at the following:
SELECT COUNT(*) FROM tbl WHERE longcol LIKE '%abc%'
This may execute 10 times faster or more with a binary collation than a nonbinary Windows collation. And with varchar data, this executes up to seven or eight times faster with a SQL collation than with a Windows collation.

More from the book "If you have a varchar column, you can speed this up by forcing the collation as follows: SELECT COUNT(*) FROM tbl WHERE longcol COLLATE SQL_Latin1_General_CP_CI_AS LIKE '%abc%';" — yoel halb, Nov 29 '12 at 17:28

Pavel Samoylenko · Answer 6 · 2017-10-05T20:31:32.893

If you can't use FullTextSearch you can increase speed in 10 times. Do next:

1 Add calculated field:

alter table TableName
add CalculatedColumnName as upper(Column1 + '|' + Column2...) collate Latin1_General_100_Bin2
persisted;

2 Add index for calculated field:

create nonclustered index IDX_TableName_CalculatedColumnName
on TableName(CalculatedColumnName);

3 Change your query text

select count(*)
from TableName
where CalculatedColumnName like '%' + upper(@ParameterValue) + '%' collate Latin1_General_100_Bin2

Source: http://aboutsqlserver.com/2015/01/20/optimizing-substring-search-performance-in-sql-server

score 0 · Answer 7 · answered Oct 14 '09 at 21:39

0

how about

field1 + field2 + field3 LIKE '%' + @DESC + '%'

or

CONTAINS(field1 + field2 + field3, @DESC)

answered Oct 14 '09 at 21:39

THEn

1,920
3
28
35

How can I optimize/refactor a TSQL "LIKE" clause?

7 Answers7

Linked