2

I have some doubts when using FULL TEXT INDEX on myISAM tables in mySQL 5.5.

I have table like this:

    CREATE TABLE `search_descr` (
      `vid` int(11) NOT NULL,
      `v_name_text` text COLLATE utf8_polish_ci,
      `d_title` text COLLATE utf8_polish_ci,
      `d_title_stemmed` text COLLATE utf8_polish_ci,
      `r_name` text COLLATE utf8_polish_ci,
      `r_name_stemmed` text COLLATE utf8_polish_ci,
      `c_title` text COLLATE utf8_polish_ci,
      `v_kind` text COLLATE utf8_polish_ci,
      `v_kind_stemmed` text COLLATE utf8_polish_ci,
      `i_title_pl_text` text COLLATE utf8_polish_ci,
      `i_title_latin_text` text COLLATE utf8_polish_ci,
      `i_title_text` text COLLATE utf8_polish_ci,
      `descr` text COLLATE utf8_polish_ci,
      PRIMARY KEY (`vid`),
      FULLTEXT KEY `descr_FT_indx` (`descr`),
      FULLTEXT KEY `all_FT_indx` (`v_name_text`,`d_title`,`d_title_stemmed`,`i_title_pl_text`,
         `i_title_latin_text`,`i_title_text`,`r_name`,`r_name_stemmed`,`c_title`,
         `v_kind`,`v_kind_stemmed`,`descr`)
    ) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_polish_ci;

I insert there my data, all columns are reasonable short except for descr -> AVG length(descr) is about 8000. I insert about 15000 rows.

Now I'd like to count rows meeting my conditions( Condition is that there is a match to any of given words in any column). So i pose a query:

      SELECT 
          COUNT(*)
      FROM
          (SELECT 
              vid
          FROM
              search_descr
          WHERE
              MATCH (v_name_text , d_title , d_title_stemmed , i_title_pl_text ,
    i_title_latin_text , i_title_text , r_name , r_name_stemmed , c_title ,
    v_kind , v_kind_stemmed , descr)
    AGAINST ('ok* oko* około*' IN BOOLEAN MODE)) a

Which takes 4s to give me an answer (11674 rows).

Another query is much faster:

          SELECT 
              COUNT(*)
          FROM
              (SELECT 
                  vid
              FROM
                  search_descr
              WHERE
                  MATCH (v_name_text , d_title , d_title_stemmed , i_title_pl_text ,
     i_title_latin_text , i_title_text , r_name , r_name_stemmed , c_title ,
     v_kind , v_kind_stemmed , descr)
     AGAINST ('oko*' IN BOOLEAN MODE) UNION SELECT 
                  vid
              FROM
                  search_descr
              WHERE
                  MATCH (v_name_text , d_title , d_title_stemmed , i_title_pl_text ,
      i_title_latin_text , i_title_text , r_name , r_name_stemmed , c_title ,
      v_kind , v_kind_stemmed , descr)
      AGAINST ('około*' IN BOOLEAN MODE) UNION SELECT 
                  vid
              FROM
                  search_descr
              WHERE
                  MATCH (v_name_text , d_title , d_title_stemmed , i_title_pl_text ,
      i_title_latin_text , i_title_text , r_name , r_name_stemmed , c_title ,
      v_kind , v_kind_stemmed , descr)
      AGAINST ('ok*' IN BOOLEAN MODE)) a
                  INNER JOIN
              search_descr v USING (vid)

It takes less than 0,180s which is reasonable, I believe.

The question is: What am I doing wrong in the first query? Why can't I use MATCH AGAINST with all its benefits (like "AGAINST(+(oko* około*) +(dlaczego) IN BOOLEAN MODE)" and so on)?

NOTE: I know that searching oko* is redundant as results will be completely contained by results of ok*, but it doesn't change the problem. My program may as well generate completely different words

NOTE 2: COUNT is just for now to show the case. In general I need to do something with the data I receive, order it by relevance of columns in which there is a match, by another conditions, add offset and limit.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
M. Klecha
  • 21
  • 3
  • `SHOW VARIABLES LIKE 'ft%';` – Rick James Jan 22 '16 at 21:02
  • 'ft_boolean_syntax', '+ -><()~*:\"\"&|' 'ft_max_word_len', '84' 'ft_min_word_len', '1' 'ft_query_expansion_limit', '20' 'ft_stopword_file', '(built-in)' – M. Klecha Jan 25 '16 at 08:35
  • Perhaps most of the 15K rows contain words starting with 'ok'? I don't know about Polish, but in English, there might be 1K hits. – Rick James Jan 25 '16 at 19:31
  • I did some experiments (that may or may not compare well to your case). I found that InnoDB's FULLTEXT was faster_ for searching for 'ok*'. Have you tried InnoDB? – Rick James Jan 25 '16 at 19:39
  • @Rick James InnoDB does not support full-text indexes in mySQL 5.5, but since 5.6. I'm not able to upgrade my DB, so I believe I'm stuck with myISAM. ok* matches to over 11K results, but counting 11K results shouldn't take so long, even though it's almost entire table, right? – M. Klecha Jan 28 '16 at 09:22
  • A _guess_: It finds all the words matching 'ok*', then scans the entries in the FT index for each such word, gathers the ids, dedups, then finally counts. Still a lot of work. – Rick James Jan 28 '16 at 16:59

0 Answers0