3

For some background information, I'm creating an application that searches against a couple of indexed tables to retrieve some records. It isn't overtly complex to the point of say Google, but it's good enough for the purpose it serves, barring this strange issue.

I'm using the Contains() function, and it's going very well, except when the search contains strings of numbers. Now, I'm only passing in a string -- nowhere numerical datatypes being passed in -- only characters. We're searching against a collection of emails, each appended with a custom ID when shot off from a workflow. So while testing, we decided to search via number strings.

In our test, we isolated a number 0042600006, which belongs to one and only one email subject. However, when using our query we are getting results for 0042600001, 0042600002, etc. The query is this as follows (with some generic columns standing in):

SELECT description, subject FROM tableA WHERE CONTAINS((subject), '0042600006')

We've tried every possible combination: '0042600006*', '"0042600006"' and '"0042600006*"'.

I think it's just a limitation of the function, but I thought this would probably be the best place for answers. Thanks in advance.

Michał Powaga
  • 22,561
  • 8
  • 51
  • 62
EHorodyski
  • 774
  • 1
  • 8
  • 30
  • Strange. Are you sure you havent customised the stop words or stemming settings for the index? Contains should be returning an exact match only in this case. Check the query being issued is exactly what you think it is. To me, the extra brackets around 'subject' suggest you might have simplified this for stackoverflow, and it might not be representative of whats actually running – carpii Nov 05 '11 at 01:57
  • It's not very far off (and sorry for such a late response, I really only come here at work). Normally we use `(subject,description)` as our columns list, but in order to see if it was in the description (which isn't visible in our program) I limited the search to just the `subject` column and was still getting false results. I know for a fact it should only be returning one record in the case of only searching the `subject` table. I forget if I mentioned it before, but I did try doing something similar on our other environments and I got the correct result. Maybe it's something with indexing. – EHorodyski Nov 07 '11 at 16:13
  • I don't have a 2008 instance on hand to test out, but I found an interesting function that might shed some light on the problem. Can your dba try [sys.dm_fts_parser](http://msdn.microsoft.com/en-us/library/cc280463.aspx), to see what query is actually being passed to the index? – 3martini Nov 09 '11 at 22:27
  • (Sorry for the delayed response) Let me forward this to my DBA and see what comes up. Thanks! – EHorodyski Nov 13 '11 at 23:57
  • 1
    BookOnline suggests an nvarchar parameter should be used, so it would be interesting to see whether you got the same results by declaring an nvarchar parameter with the value 0042600006 and passing that parameter into the CONTAINS function – Steve Henderson Nov 14 '11 at 17:27
  • Have you tried to cast the number to a string explicitely, cast('0042600006' as nvarchar(max))? It might be that SQL Server sees '0042600006' as an integer and then only takes the first 9 characters. I have not tried this yet, but it might give you a clue... – Geert Immerzeel Nov 22 '11 at 08:14
  • Is the number surrounded by white space or is it in with other characters which would make the 'word' it is part of not just the number itself? – Brian Nov 23 '11 at 10:53
  • I hate to say this, especially as there seems to be so much positive feedback to my first real StackOverflow question, but we still haven't had the time to work on this! Like I mentioned before, it works fine in our other test environment, and right now we have the business testing the app. We're piggybacking off a 3rd party plugin that created these indexed tables and their program shows those same incorrect results, so I'm leaning towards some kind of indexing issue with our one environment. But aside from that we're in our busy season so I apologize for leaving you all hanging... – EHorodyski Nov 23 '11 at 14:21
  • @DrFeelgood running into the same problem. did you finally found a suitable solution? – gsharp Nov 12 '13 at 09:04

3 Answers3

1

Asked this same question recently. Please see the insightful answer someone left me here

Essentially what this user says to do is to turn off the noise words (Microsoft has included integers 0-9 as noise in the Full Text Search). Hope you can use this awesome tool with integers as I now am!

Community
  • 1
  • 1
Rachael
  • 1,965
  • 4
  • 29
  • 55
  • Hmm, we have some searches that use '000007658' and the like. How small of numbers are considered noise? – EHorodyski Oct 25 '12 at 19:08
  • Even '0' - '9'. I am pretty sure it won't index any string that doesn't have at least a letter attached, after a LOT of testing. I've got rows with integers that look like [ 2 45 3 19 2012] where everything in there is successfully located. It knows that what you put in single quotes is the full index to search by, delimited by a space. I ended up just turning off the StopList in "properties" on the full text index. Very very fast with millions and millions of rows. Much MUCH faster than `LIKE`, or using stored procedures to tie everything together. – Rachael Oct 25 '12 at 19:28
  • To clarify additionally, it's not going to have anything to "grab onto" and index if you've got any sort of non-text character inclusive string if you don't remove the integers from the StopList – Rachael Oct 25 '12 at 20:03
1

try to add language 1033 as an additional parameter. that worked with my solution.

SELECT description, subject FROM tableA WHERE CONTAINS((subject), '0042600006', language 1033)
gsharp
  • 27,557
  • 22
  • 88
  • 134
-1

try using SELECT description, subject FROM tableA WHERE CONTAINS((subject), '%0042600006%')

Naval
  • 344
  • 2
  • 7
  • To update, trying to cast it as NVARCHAR didn't help, nor did using front and back wildcards. The general consensous around the office is that it may be a configuration issue either in the SQL Server or CRM, especially considering it only happens in one environment opposed to all. – EHorodyski Dec 09 '11 at 13:42
  • try using '*' in place of '%' for the wild cards – Naval Dec 18 '11 at 05:42