40

Does anyone have a LIKE pattern that matches whole words only?

It needs to account for spaces, punctuation, and start/end of string as word boundaries.

I am not using SQL Full Text Search as that is not available. I don't think it would be necessary for a simple keyword search when LIKE should be able to do the trick. However if anyone has tested performance of Full Text Search against LIKE patterns, I would be interested to hear.

Edit:

I got it to this stage, but it does not match start/end of string as a word boundary.

where DealTitle like '%[^a-zA-Z]pit[^a-zA-Z]%' 

I want this to match "pit" but not "spit" in a sentence or as a single word.

E.g. DealTitle might contain "a pit of despair" or "pit your wits" or "a pit" or "a pit." or "pit!" or just "pit".

Michael
  • 8,362
  • 6
  • 61
  • 88
mike nelson
  • 21,218
  • 14
  • 66
  • 75
  • 2
    How can you say that LIKE should be able to do the trick if you don't know how to do what you need done? – Lasse V. Karlsen Mar 26 '11 at 18:25
  • 1
    You should use full text search for this. If it really isn't available (why not? It comes even with Express and Advanced Services) CLR and regular expressions would be a better fit than `LIKE` – Martin Smith Mar 26 '11 at 18:32
  • Why is full text search better than LIKE. Is it faster? Or do you think whole words can't be done with LIKE? – mike nelson Mar 26 '11 at 18:44
  • 1
    Full Text Search comes with a word breaker that breaks the text apart and pre-indexes them. `LIKE` with a leading wildcard will always involve a full scan of all your data. – Martin Smith Mar 26 '11 at 18:50
  • Even inside a sentence, what about cases like: "pit5" or "3pit"? '%[^a-z]pit[^a-z]%' will allow them, and it seems like you didn't intend that. – Helen Craigman Jun 21 '13 at 03:16

9 Answers9

54

Full text indexes is the answer.

The poor cousin alternative is

'.' + column + '.' LIKE '%[^a-z]pit[^a-z]%'

FYI unless you are using _CS collation, there is no need for a-zA-Z

RichardTheKiwi
  • 105,798
  • 26
  • 196
  • 262
  • Awesome. Good thinking. The reason I don't want full text indexes is I can't rely on that option being switched on. Is it actually significantly better anyway? – mike nelson Mar 26 '11 at 19:12
  • How can I put it?.. YES YES YES! (most of the time anyway). In simplistic terms, an FTE breaks text columns into words and indexes each word separately. (A normal index will only put one record into the index once) So if the word "pit" appears in 99 records out of 1 million, then it is as good as a normal index that identifies 99 records out of 1 million (without performing any pattern matching), instead of scanning all of them and performing full-string-length comparison (LIKE) for each. It's that good. – RichardTheKiwi Mar 26 '11 at 19:23
  • Ok, thanks! I guess I better do some real world tests to check perf. It is certainly more convenient using LIKE as you can do it on any random field without any setup, so this is useful for an admin search page for example, where you can search any table any field. Thanks. – mike nelson Mar 26 '11 at 19:51
  • 2
    OK, I have tested it on a table with 11,000 rows, searching two columns for three keywords. Using Full Text Index CONTAINS((col1,col2), 'keyword1 AND keyword2 AND keyword3') is 15 milliseconds compared with the LIKE pattern (with three ANDs and three ORs) at 350 milliseconds. So about 25 times faster to use Full Text Index. However, LIKE still seems pretty efficient considering its doing a full table scan so in some cases this is certainly adequate - depending on volume of users. – mike nelson Mar 27 '11 at 01:11
  • That said, when searching NTEXT fields - which requires adding CONVERT(VARCHAR(MAX), column) - this makes it much slower again. – mike nelson Mar 27 '11 at 01:21
  • Full-text search is not equivalent to searching word boundaries. In many cases you may want to search an exact fragment of text (like if it is enclosed in double quotes on Google), respecting word boundaries. CONTAINS and FREETEXT perform many operations on words (like stemming) that may not be desiderable in some cases. – Marco Marsala Nov 25 '15 at 11:15
  • 2
    EXPLANATION regarding `.` (dot) characters: The `[^a-z]` part requires matching exactly a single character. So an empty character will NOT match. For example, the OP's LIKE will not match 'pit', ' pit' or 'pit '. By adding dots, the `[^a-z]` part matches with these dots. – anar khalilov Jan 22 '16 at 13:43
  • @anarkhalilov did not work in SQL Server 2008 R2. Neither with the '#' character. I did get it to work using just the LIKE perdicates - a-zA-Z0-9 – Fandango68 Mar 15 '19 at 00:59
7

you can just use below condition for whitespace delimiters:

(' '+YOUR_FIELD_NAME+' ') like '% doc %'

it works faster and better than other solutions. so in your case it works fine with "a pit of despair" or "pit your wits" or "a pit" or "a pit." or just "pit", but not works for "pit!".

Saleh Mosleh
  • 504
  • 5
  • 12
5

I think the recommended patterns exclude words with do not have any character at the beginning or at the end. I would use the following additional criteria.

where DealTitle like '%[^a-z]pit[^a-z]%' OR 
  DealTitle like 'pit[^a-z]%' OR 
  DealTitle like '%[^a-z]pit'

I hope it helps you guys!

Michele La Ferla
  • 6,775
  • 11
  • 53
  • 79
3

Surround your string with spaces and create a test column like this:

 SELECT t.DealTitle 
 FROM yourtable t
 CROSS APPLY (SELECT testDeal = ' ' + ISNULL(t.DealTitle,'') + ' ') fx1
 WHERE fx1.testDeal LIKE '%[^a-z]pit[^a-z]%'
Eric
  • 81
  • 4
2

If you can use regexp operator in your SQL query..

For finding any combination of spaces, punctuation and start/end of string as word boundaries:

where DealTitle regexp '(^|[[:punct:]]|[[:space:]])pit([[:space:]]|[[:punct:]]|$)'
zzzaaabbb
  • 139
  • 1
  • 10
1

Another simple alternative:

WHERE DealTitle like '%[^a-z]pit[^a-z]%' OR 
      DealTitle like '[^a-z]pit[^a-z]%' OR 
      DealTitle like '%[^a-z]pit[^a-z]'
शेखर
  • 17,412
  • 13
  • 61
  • 117
Helen Craigman
  • 1,443
  • 3
  • 16
  • 25
1

This is a good topic and I want to complement this to someone how needs to find some word in some string passing this as element of a query.

SELECT 
    ST.WORD, ND.TEXT_STRING
FROM 
    [ST_TABLE] ST
LEFT JOIN 
    [ND_TABLE] ND ON ND.TEXT_STRING LIKE '%[^a-z]' + ST.WORD + '[^a-z]%'
WHERE 
    ST.WORD = 'STACK_OVERFLOW' -- OPTIONAL

With this you can list all the incidences of the ST.WORD in the ND.TEXT_STRING and you can use the WHERE clausule to filter this using some word.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Andrey Hartung
  • 765
  • 5
  • 11
0

Try using charindex to find the match:

Select * 
from table 
where charindex( 'Whole word to be searched', columnname) > 0
Nick
  • 138,499
  • 22
  • 57
  • 95
0

You could search for the entire string in SQL:

select * from YourTable where col1 like '%TheWord%'

Then you could filter the returned rows client site, adding the extra condition that it must be a whole word. For example, if it matches the regex:

\bTheWord\b

Another option is to use a CLR function, available in SQL Server 2005 and higher. That would allow you to search for the regex server-side. This MSDN artcile has the details of how to set up a dbo.RegexMatch function.

Andomar
  • 232,371
  • 49
  • 380
  • 404