5

I am experiencing a very weird behaviour from the management studio GO.
I am following the full-Text Search tutorial and I run the following query over adventure works

 SELECT FT_TBL.ProductDescriptionID,
 FT_TBL.Description, 
 KEY_TBL.RANK
 FROM Production.ProductDescription AS FT_TBL INNER JOIN
    CONTAINSTABLE (Production.ProductDescription,
      Description, 
      '(light NEAR aluminum) OR
      (lightweight NEAR aluminum)'
    ) AS KEY_TBL
    ON FT_TBL.ProductDescriptionID = KEY_TBL.[KEY]
 WHERE KEY_TBL.RANK > 2
 ORDER BY KEY_TBL.RANK DESC;

When I run it with Go (before the query ) and without Go and the results are different. I played with it some more - copied the select and added go between and the results are still different. Notice from the picture that it is the same select written twice but with two different results

enter image description here

Any idea?

EDIT: Just discovered the execution plan is different because of the go
------LINK----------

enter image description here

Bick
  • 17,833
  • 52
  • 146
  • 251
  • 2
    Which result is incorrect? Is it repeatable consistently? – gbn May 27 '13 at 14:57
  • Are the strings "ligth" or "ligthweight" on the rows with `ProductDescriptionID` 661 and 1062?, because on the image of the first results you posted they don't seem to contain them. – Lamak May 27 '13 at 15:01
  • Yes they are correct. the string does contain "lightweight". Here it is "Made from the same aluminum alloy as our top-of-the line HL frame, the ML features a lightweight down-tube milled to the perfect diameter for optimal strength. Women's version." – Bick May 27 '13 at 15:04
  • Added the execution plan(with high res link) - different for the same selects. – Bick May 27 '13 at 15:05
  • 1
    I can't reproduce this on AdventureWorks2008, the results are identical – WrinkleFree May 27 '13 at 17:38
  • No repro for me on `Microsoft SQL Server 2012 (SP1) - 11.0.3128.0 (X64)` [screenshot](http://i.stack.imgur.com/w9frq.png). Can you upload the XML version of the actual execution plans? – Martin Smith May 27 '13 at 19:59
  • @MartinSmith I can reproduce on the same version with blank lines. Try removing the blank line above the second query to match what I have. I didn't add my answer without testing – gbn May 27 '13 at 21:41
  • 2
    @gbn can't reproduce but by design anyway IMO. The rank values only have significance for relative ordering within a result set and no absolute meaning. – Martin Smith May 27 '13 at 23:48
  • you can remove the where clause on the rank and the problem remains. – Bick May 28 '13 at 05:22
  • @rails - So please upload the XML version of the actual execution plans not just the images. – Martin Smith May 28 '13 at 06:24

2 Answers2

6

I can reproduce this

Initial tests

Assuming a GO..SELECT..GO pattern and some blank lines

Before     After    Rows
y          n        6
y          y        4
n          y        4
n          n        4

If I change the order of events...

Before     After    Rows
y          n        6
y          y        4
n          n        4
n          y        4

And again

Before     After    Rows
y          y        4
n          n        4
n          y        4
y          n        6

Before     After    Rows
y          y        4
n          n        4
y          n        6
n          y        4

Update after @MartinSmith's answer

If I remove the WHERE, then I get the same results and the same rank value for four queries regardless of blank lines. With the WHERE I get the same results as above.

Martin showed that rank is arbitrary, but his same link also states

This explains why the same query can return different rank results over time as full-text indexed data is added, modified, and deleted, and as the smaller indexes are merged.

In this case, data is static and the query is identical except for blank lines.

Changing the query text

  • With WHERE KEY_TBL.RANK > 0 = identical row count and rank values
  • No WHERE clause = identical row count and rank values
  • With the CONTAINSTABLE wrapped in TOP 200000..ORDER BY RANK = identical row count and rank values, regardless of WHERE

With original WHERE KEY_TBL.RANK > 2

  • Change the case of the first FT_TBL alias to be different each time (fT_TBL, Ft_TBL, FT_tBL, FT_TbL) which means a different query plan = identical row count and rank values
  • Changing all FT_TBL aliases to FT_TBl = identical row count and rank values
  • Changing all FT_TBl aliases back to FT_TBL = different rank values based on blank lines -> different row counts

For the last query (different rank values) I get 4 identical execution plans. The filter is applied to the TVF before the JOIN.

  • Adding a blank line after the ON before the WHERE (keeping leading and trailing blank lines) = identical row count and rank values
  • Adding a blank line before FROM = identical row count and rank values
  • Adding 3 spaces after JOIN = identical row count and rank values

Adding DBCC DROPCLEANBUFFERS

  • Adding DBCC DROPCLEANBUFFERS at the top 4-4-6-4 rows
  • Same if I add DBCC DROPCLEANBUFFERS before each query
  • Adding an extra blank lines (with the DBCC) before or after the first query gives more rows (6-4-6-4). Found this by mistake, but leads back to the first observation that blank lines affect results

After Paul's comment

Note the trailing and leading blank lines.

This fails reliably. The second query gives 4 rows

GO

SELECT FT_TBL.ProductDescriptionID,
   FT_TBL.Description, 
   KEY_TBL.RANK
FROM Production.ProductDescription AS FT_TBL INNER JOIN
   CONTAINSTABLE (Production.ProductDescription,
      Description, 
      '(light NEAR aluminum) OR
      (lightweight NEAR aluminum)'
   ) AS KEY_TBL
   ON FT_TBL.ProductDescriptionID = KEY_TBL.[KEY]
WHERE KEY_TBL.RANK > 2
ORDER BY KEY_TBL.RANK DESC;

GO
SELECT FT_TBL.ProductDescriptionID,
   FT_TBL.Description, 
   KEY_TBL.RANK
FROM Production.ProductDescription AS FT_TBL INNER JOIN
   CONTAINSTABLE (Production.ProductDescription,
      Description, 
      '(light NEAR aluminum) OR
      (lightweight NEAR aluminum)'
   ) AS KEY_TBL
   ON FT_TBL.ProductDescriptionID = KEY_TBL.[KEY]
WHERE KEY_TBL.RANK > 2
ORDER BY KEY_TBL.RANK DESC;
GO

SELECT FT_TBL.ProductDescriptionID,
   FT_TBL.Description, 
   KEY_TBL.RANK
FROM Production.ProductDescription AS FT_TBL INNER JOIN
   CONTAINSTABLE (Production.ProductDescription,
      Description, 
      '(light NEAR aluminum) OR
      (lightweight NEAR aluminum)'
   ) AS KEY_TBL
   ON FT_TBL.ProductDescriptionID = KEY_TBL.[KEY]
WHERE KEY_TBL.RANK > 2
ORDER BY KEY_TBL.RANK DESC;
GO
SELECT FT_TBL.ProductDescriptionID,
   FT_TBL.Description, 
   KEY_TBL.RANK
FROM Production.ProductDescription AS FT_TBL INNER JOIN
   CONTAINSTABLE (Production.ProductDescription,
      Description, 
      '(light NEAR aluminum) OR
      (lightweight NEAR aluminum)'
   ) AS KEY_TBL
   ON FT_TBL.ProductDescriptionID = KEY_TBL.[KEY]
WHERE KEY_TBL.RANK > 2
ORDER BY KEY_TBL.RANK DESC;

GO

This gives the same every time after I shift+TAB the entire query to remove indents

GO

SELECT FT_TBL.ProductDescriptionID,
FT_TBL.Description, 
KEY_TBL.RANK
FROM Production.ProductDescription AS FT_TBL INNER JOIN
CONTAINSTABLE (Production.ProductDescription,
Description, 
'(light NEAR aluminum) OR
(lightweight NEAR aluminum)'
) AS KEY_TBL
ON FT_TBL.ProductDescriptionID = KEY_TBL.[KEY]
WHERE KEY_TBL.RANK > 2
ORDER BY KEY_TBL.RANK DESC;

GO
SELECT FT_TBL.ProductDescriptionID,
FT_TBL.Description, 
KEY_TBL.RANK
FROM Production.ProductDescription AS FT_TBL INNER JOIN
CONTAINSTABLE (Production.ProductDescription,
Description, 
'(light NEAR aluminum) OR
(lightweight NEAR aluminum)'
) AS KEY_TBL
ON FT_TBL.ProductDescriptionID = KEY_TBL.[KEY]
WHERE KEY_TBL.RANK > 2
ORDER BY KEY_TBL.RANK DESC;
GO

SELECT FT_TBL.ProductDescriptionID,
FT_TBL.Description, 
KEY_TBL.RANK
FROM Production.ProductDescription AS FT_TBL INNER JOIN
CONTAINSTABLE (Production.ProductDescription,
Description, 
'(light NEAR aluminum) OR
(lightweight NEAR aluminum)'
) AS KEY_TBL
ON FT_TBL.ProductDescriptionID = KEY_TBL.[KEY]
WHERE KEY_TBL.RANK > 2
ORDER BY KEY_TBL.RANK DESC;
GO
SELECT FT_TBL.ProductDescriptionID,
FT_TBL.Description, 
KEY_TBL.RANK
FROM Production.ProductDescription AS FT_TBL INNER JOIN
CONTAINSTABLE (Production.ProductDescription,
Description, 
'(light NEAR aluminum) OR
(lightweight NEAR aluminum)'
) AS KEY_TBL
ON FT_TBL.ProductDescriptionID = KEY_TBL.[KEY]
WHERE KEY_TBL.RANK > 2
ORDER BY KEY_TBL.RANK DESC;

GO

Still looking

Ok, time to search MS Connect. New one: https://connect.microsoft.com/SQLServer/feedback/details/788691/blank-lines-before-and-after-full-text-query-change-results

gbn
  • 422,506
  • 82
  • 585
  • 676
  • Assuming you meant `11.0.3128.0 (X64)` then same version as me. I don't see any difference in results. Describing it as "failing" seems incorrect to me. It might be informative to know why the rank value can differ but at the end of the day it is not guaranteed to remain the same. It may just depend on the order the index is processed. Seems just like running 4 similar queries without an `ORDER BY` and complaining that one returns in a different order. – Martin Smith May 29 '13 at 07:25
  • @MartinSmith: I see that the Rank value differs on static data with almost identical queries. By almost I mean blank lines above and below. This is the curious part. – gbn May 29 '13 at 07:49
  • @gbn - I agree it is curious. On the query with the lower rank do you get the sort before the join as in the OP? Is there any difference in the execution plans regarding the parameters to the full text index table valued function or cardinality estimates? (I see the estimated costs are different in the OP) – Martin Smith May 29 '13 at 07:52
  • @MartinSmith: XML plans are the same except for the ActualRows and the trailing CRLF on StatementText. The only difference – gbn May 29 '13 at 08:06
1

I can't reproduce this on my machine but suspect this is by design with the error being filtering WHERE KEY_TBL.RANK > 2

The topic How Search Query Results Are Ranked (Full-Text Search) states

This rank value is calculated on every row and can be used as an ordering criteria to sort the result set of a given query by relevance. The rank values indicate only a relative order of relevance of the rows in the result set. The actual values are unimportant and typically differ each time the query is run. The rank value does not hold any significance across queries.

In the second query all values seem to be scaled down to FLOOR(20% * RANK) from the first query. As this is only a relative scale filtering on an absolute RANK value seems likely to cause this kind of issue where matching rows arbitrarily drop in and out of the results.

Martin Smith
  • 438,706
  • 87
  • 741
  • 845
  • Thanks. I am aware of the rank re-caclculation. I have removed the where and this continued to appear. – Bick May 28 '13 at 05:21
  • Without the WHERE, I can't reproduce this but the rank values are all the same. With the WHERE, the rank value differs so the filter changes, but it depends on blank lines. – gbn May 28 '13 at 06:39
  • @gbn - And when you **can** repro it do you get the same actual number of rows coming out of the table valued function or does that differ? (i.e. are they being eliminated by the filter or the full text query itself?) – Martin Smith May 28 '13 at 06:43
  • @MartinSmith: still investigating. From you link though "This explains why the same query can return different rank results over time as full-text indexed data is added, modified, and deleted, and as the smaller indexes are merged.". Data is static here. I also want to try TOP 2000000..ORDER BY to force materialisation inside – gbn May 28 '13 at 06:45