I have some doubts when using FULL TEXT INDEX on myISAM tables in mySQL 5.5.
I have table like this:
CREATE TABLE `search_descr` (
`vid` int(11) NOT NULL,
`v_name_text` text COLLATE utf8_polish_ci,
`d_title` text COLLATE utf8_polish_ci,
`d_title_stemmed` text COLLATE utf8_polish_ci,
`r_name` text COLLATE utf8_polish_ci,
`r_name_stemmed` text COLLATE utf8_polish_ci,
`c_title` text COLLATE utf8_polish_ci,
`v_kind` text COLLATE utf8_polish_ci,
`v_kind_stemmed` text COLLATE utf8_polish_ci,
`i_title_pl_text` text COLLATE utf8_polish_ci,
`i_title_latin_text` text COLLATE utf8_polish_ci,
`i_title_text` text COLLATE utf8_polish_ci,
`descr` text COLLATE utf8_polish_ci,
PRIMARY KEY (`vid`),
FULLTEXT KEY `descr_FT_indx` (`descr`),
FULLTEXT KEY `all_FT_indx` (`v_name_text`,`d_title`,`d_title_stemmed`,`i_title_pl_text`,
`i_title_latin_text`,`i_title_text`,`r_name`,`r_name_stemmed`,`c_title`,
`v_kind`,`v_kind_stemmed`,`descr`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_polish_ci;
I insert there my data, all columns are reasonable short except for descr -> AVG length(descr) is about 8000. I insert about 15000 rows.
Now I'd like to count rows meeting my conditions( Condition is that there is a match to any of given words in any column). So i pose a query:
SELECT
COUNT(*)
FROM
(SELECT
vid
FROM
search_descr
WHERE
MATCH (v_name_text , d_title , d_title_stemmed , i_title_pl_text ,
i_title_latin_text , i_title_text , r_name , r_name_stemmed , c_title ,
v_kind , v_kind_stemmed , descr)
AGAINST ('ok* oko* około*' IN BOOLEAN MODE)) a
Which takes 4s to give me an answer (11674 rows).
Another query is much faster:
SELECT
COUNT(*)
FROM
(SELECT
vid
FROM
search_descr
WHERE
MATCH (v_name_text , d_title , d_title_stemmed , i_title_pl_text ,
i_title_latin_text , i_title_text , r_name , r_name_stemmed , c_title ,
v_kind , v_kind_stemmed , descr)
AGAINST ('oko*' IN BOOLEAN MODE) UNION SELECT
vid
FROM
search_descr
WHERE
MATCH (v_name_text , d_title , d_title_stemmed , i_title_pl_text ,
i_title_latin_text , i_title_text , r_name , r_name_stemmed , c_title ,
v_kind , v_kind_stemmed , descr)
AGAINST ('około*' IN BOOLEAN MODE) UNION SELECT
vid
FROM
search_descr
WHERE
MATCH (v_name_text , d_title , d_title_stemmed , i_title_pl_text ,
i_title_latin_text , i_title_text , r_name , r_name_stemmed , c_title ,
v_kind , v_kind_stemmed , descr)
AGAINST ('ok*' IN BOOLEAN MODE)) a
INNER JOIN
search_descr v USING (vid)
It takes less than 0,180s which is reasonable, I believe.
The question is: What am I doing wrong in the first query? Why can't I use MATCH AGAINST with all its benefits (like "AGAINST(+(oko* około*) +(dlaczego) IN BOOLEAN MODE)" and so on)?
NOTE: I know that searching oko* is redundant as results will be completely contained by results of ok*, but it doesn't change the problem. My program may as well generate completely different words
NOTE 2: COUNT is just for now to show the case. In general I need to do something with the data I receive, order it by relevance of columns in which there is a match, by another conditions, add offset and limit.