2

Lets assume we have a full text indexed table with those records:

blabla bla bla 101010,65 blabla bla bla 
blabla bla bla 1012344,34 blabla bla bla 

(The decimal separator in Portuguese is "," not "." as in English)

When we execute a query like:

where contains(field, "101011") or
where contains(field, "1012344")

The full text engine is returning those records because it seems to me that it is rounding the numbers as:

101010,65 becomes 101011
1012344,34 becomes 1012344

Is there any way of avoiding that?

EDIT

Sorry, i forgot to say that the column is a varchar max column and not a currency column. This is happening in this field when it has a float value despite the fact that it is a varchar column

EDIT2

This is not the only data I have in my column. Numbers like those appears frequently on my indexed texts. It is not concatenated. As I said, this is part of the original text and I have done nothing to the original text. I guess this is a behavior of the word breaker, but who knows for sure?

Rafael Colucci
  • 6,018
  • 4
  • 52
  • 121

2 Answers2

1

EDIT:

< Ignore > The reason you are seeing this behaviour is that, the default wordbreakers for SQL fulltext search are defined by the English language (locale 1033). In English, a comma is a valid word-breaker, thereby breaking your number into two different numbers. However, if you use the Portuguese word-breaker, FTS quite cleverly retains the numbers together. Try running the following query on your SQL Server to see how the fulltext engine parses the same input differently depending on the locale specified:

--use locale English
select * from sys.dm_fts_parser('"12345,10"',1033,NULL,0)
--use locale Portuguese
select * from sys.dm_fts_parser('"12345,10"',2070,NULL,0)

< /Ignore >

UPDATE: Alright, I have managed to replicate your scenario and yes it does seem to be default behaviour with SQL Server FTS. However, it only seems to round up to nearest 1/10th of the number (the nearest 10 centavos in your case), and NOT to the nearest whole number.

So for example; 12345,88 would be returned in searches for both 12345,88 as well as 12345,9, while 56789,98 would appear in searches for 56789,98 as well as 56790. However, a number such as 45678,60 will remain intact with no rounding up or down, so it's not as bad as you think.

Not sure if there is anything you can do to change this behaviour though. A quick search on Google returned nothing.

mohdowais
  • 99
  • 1
  • 5
  • The wordbreaker is configured as brazilian, and no, sql is not retaining the number as you said. Sql is finding both the number as it and the number rounded as i said in my post. – Rafael Colucci Jul 11 '11 at 12:56
  • ok so that didn't work for you. It seems to be unlikely behaviour for SQL fulltext, so I would lean towards other explanations, such as the source of the data. Is this the only data you have in your column? Is the data in the fulltext-indexed column perhaps created by concatenating data from several other columns, where data conversion may have taken place? – mohdowais Jul 11 '11 at 17:37
  • This is not the only data I have in my column. Numbers like those appears frequently on my indexed texts. It is not concatenated. As I said, this is part of the original text and I have done nothing to the original text. I guess this is a behavior of the word breaker, but who knows for sure? – Rafael Colucci Jul 11 '11 at 18:00
  • In response to your edit, yes. This is exactly what is happening to me. And yes, this is really bad in my scenario. I also goggled and i found nothing about it, maybe because it is related to the word breaker algorithm and there is nothing we can do about it but implement a custom IWordBreaker. I will accept you answer because you helped me reproduce the issue and now i am sure there is nothing wrong with my server. I hope somebody, someday, will find an way of fixing this issue. – Rafael Colucci Jul 12 '11 at 16:47
0

My suggestion would be to not use the Money data type in the first place. All it buys you is a little formatting ease (which you should be doing at the presentation layer anyway), but brings about other complications and inflexibility. I'm not sure DECIMAL/NUMERIC would solve this particular issue, as I'm not a full-text guy, but I try to steer people away from problematic data types like MONEY whenever I can. See this previous question for lots of discussion about this. Should you choose the MONEY or DECIMAL(x,y) datatypes in SQL Server?

Community
  • 1
  • 1
Aaron Bertrand
  • 272,866
  • 37
  • 466
  • 490
  • Sorry, i forgot to say that the column is a varchar max column and not a currency column. This is happening when in this field when it has a float value despite the fact that it is a varchar column – Rafael Colucci Jul 09 '11 at 18:10