1

I stumbled accross an issue with full text search in columns which may contains domain names on a Microsoft SQL Server 2012.

A table cell containing example.com is added to the full text catalog by the term example and com. The latter makes it impossible to search for a domain name, as any entry containing that tld will be found.

What do you need to do in order to prevent urls from being broken?

EDIT: The example query would be:

SELECT * FROM Test WHERE FREETEXT(test, 'example.com')

The Test table only contains two rows, containing example.comand differenturl.com, both are returned as a result. The Test database was created for this example.

Keith
  • 20,636
  • 11
  • 84
  • 125
sk904861
  • 1,247
  • 1
  • 14
  • 31
  • Why you don't just use varchar for saving domains ? – Alker Jul 16 '13 at 11:02
  • These domains appear within text, e.g. "I like example.com". – sk904861 Jul 16 '13 at 11:13
  • Are you sure it's broken? I've confirmed this query yields results (both with and without stop word lists defined on the index): `SELECT * FROM YourTable WHERE Contains(*, 'example.com')` – Keith Jul 16 '13 at 16:28
  • Have you tried searching for `example.com` having other entries like `url.com`? My problem is, that also the latter shows up when searching for `example.com`, as also the `url.com` entry is referred to by just `com` by the full text catalog. – sk904861 Jul 16 '13 at 18:07
  • I'm not seeing this behavior. If I search for `example.com` then I only get results with `example.com`. Same for `url.com`. – Keith Jul 17 '13 at 16:09
  • Can you post your full text query? – Keith Jul 17 '13 at 16:10
  • Thank you for your effort. I just updated my question. – sk904861 Jul 17 '13 at 18:42

2 Answers2

2

Use CONTAINS instead of FREETEXT.

SELECT * FROM Test WHERE CONTAINS(test, 'example.com')

FREETEXT will treat example.com as if it is example OR com which explains why your FREETEXT query for example.com matches url.com. It will also match inflectional forms (examples, exampling...) and synonyms which would likely cause other problems for you.

If you still need the flexibility of FREETEXT for your other search terms you could use both functions:

SELECT * FROM Test WHERE CONTAINS(test, 'example.com') and FREETEXT(test, 'some other text')

Keith
  • 20,636
  • 11
  • 84
  • 125
  • Sadly, `CONTAINS` does not allow multiple terms in one query, which is an essential part of full text magic in my case. – sk904861 Jul 18 '13 at 15:38
  • It does but you have to use boolean syntax. `SELECT * FROM Test WHERE CONTAINS(test, 'example.com AND (something OR other)')` – Keith Jul 18 '13 at 17:45
  • Unfortunately I forgot to mention in my question, that `CONTAINS` still leaves another problem (can't search for `C&A`). Your answer is being accepted, though. – sk904861 Jul 19 '13 at 10:37
  • `&` is a boolean operator in CONTAINS. You can work around it by putting the term in double quotes: `SELECT * FROM Test WHERE CONTAINS(test, '"C&A" & example.com')` – Keith Jul 19 '13 at 14:00
  • I also extended my answer with an example of how you can still use FREETEXT for your other terms in conjunction with CONTAINS for the domain. – Keith Jul 19 '13 at 14:01
-2

This:

SELECT TOP 1000 
[Domain1]
FROM [TESTIT].[dbo].[DomainTest] where Domain1 like '%example%com';
Mark Rotteveel
  • 100,966
  • 191
  • 140
  • 197
Alker
  • 61
  • 9