2

I have a MySQL 5.5.4 table with >200M rows which has a FULLTEXT index on two columns (Title,Body).

When I do a simple FULLTEXT query in the default NATURAL LANGUAGE mode for some popular results (they'd return 2M+ rows), I'm getting zero rows back:

SELECT COUNT(*) FROM itemsearch WHERE MATCH (Title, Body) AGAINST ('fubar');

But when I do a FULLTEXT query in BOOLEAN mode, I can see the rows in question do exist (I get 2M+ back, depending):

SELECT COUNT(*) FROM itemsearch WHERE MATCH (Title, Body) AGAINST ('+fubar' IN BOOLEAN MODE);

I have some queries which return ~500K rows which are working fine in either mode, so if it's result size related, it seems to crop up somewhere between 500K and a little north of 2M.

I've tried playing with the various buffer size variables, to no avail. It's clearly not the 50% threshold, since we're not getting 100M rows back for any result.

Any ideas?

hobodave
  • 2,840
  • 2
  • 24
  • 34
Don MacAskill
  • 1,808
  • 3
  • 16
  • 22
  • Could you give your version Don? Trying to duplicate. – hobodave Mar 05 '11 at 05:19
  • Sorry, should have put that in before. Edited, but it's 5.5.4. Haven't tried 5.5 GA yet, but didn't see anything in the notes that suggested it would make a difference. – Don MacAskill Mar 05 '11 at 08:54
  • Well that took forever to generate that amount of test data. I've definitely duplicated it now. Seems like a bug; doing some digging. – hobodave Mar 07 '11 at 04:38

2 Answers2

1

The behavior you see is by design. When using the natural language query (NLQ) if the total number of documents containing the term is greater than 2 million, then the weight will be 0.

This is done on the following lines of /storage/myisam/ft_nlq_search.c:

gweight=word->weight*GWS_IN_USE;
if (gweight < 0 || doc_cnt > 2000000)
  gweight=0;

According to revision 1346.322.1 this was a bugfix to prevent "tree overflow".

It looks like you're going to have to switch to Sphinx or a Lucene solution.

hobodave
  • 2,840
  • 2
  • 24
  • 34
-1

Its likely that these columns are not being indexed. The relevant section of the manual:

The MATCH() column list must match exactly the column list in some FULLTEXT index definition for the table, unless this MATCH() is IN BOOLEAN MODE. Boolean-mode searches can be done on nonindexed columns, although they are likely to be slow.

This would explain why you're seeing different behavior in Boolean mode.

Andrew M.
  • 11,182
  • 2
  • 35
  • 29
  • If that were the case it would return the error `ERROR 1191 (HY000): Can't find FULLTEXT index matching the column list`. Not to mention Don opens with "I have ... a FULLTEXT index on two columns (Title,Body)" – hobodave Mar 05 '11 at 05:14
  • The documentation doesn't indicate that, unfortunately. Guess that's a -1 against MySQL's docs as well. :( Also... wouldn't be the first time a poster has thought they set something but did not. – Andrew M. Mar 05 '11 at 16:53