1

The following query takes 1.1s to execute, the EXPLAIN shows the use of a FULLTEXT index:

SELECT SQL_NO_CACHE COUNT(*)
FROM e_entity
WHERE meta_oid=336799 AND MATCH(sIndex07) AGAINST ("#UPR-1393#" IN NATURAL LANGUAGE MODE)

EXPLAIN:
id: 1
select_type: SIMPLE
table: e_entity
type: fulltext
possible_keys: App_Parent,sindex07
key: sIndex07
key_len: 0
ref: (NULL)
rows: 1
extra: Using Where

There's a FULLTEXT index on sIndex07 column. However when this FULLTEXT index is removed and replaced by a usual KEY index then the query:

SELECT SQL_NO_CACHE COUNT(*)
FROM e_entity
WHERE meta_oid=336799 AND sIndex07 LIKE "%#UPR-1393#%"

EXPLAIN:
id: 1
select_type: SIMPLE
table: e_entity
type: ref
possible_keys: App_Parent
key: App_Parent
key_len: 4
ref: const
rows: 331283
extra: Using Where

CREATE TABLE `e_entity` (
`OID` int(11) NOT NULL AUTO_INCREMENT,
`E_E_OID` int(11) DEFAULT NULL,
`UNIQUE_IDX` int(11) NOT NULL,
`APP_OID` int(11) NOT NULL,
`META_OID` int(11) NOT NULL,
`STORE_DATE` datetime NOT NULL,
`REL_DISPLAY` varchar(1024) NOT NULL,
`sIndex01` varchar(1024) NOT NULL,
`SINDEX02` varchar(1024) NOT NULL,
`SINDEX03` varchar(1024) NOT NULL,
`SINDEX04` varchar(1024) NOT NULL,
`SINDEX05` varchar(1024) NOT NULL,
`SINDEX06` varchar(1024) NOT NULL,
`sIndex07` varchar(1024) NOT NULL,
`SINDEX08` varchar(1024) NOT NULL,
`SINDEX09` varchar(1024) NOT NULL,
`sIndex10` varchar(1022) NOT NULL,
`SINDEX11` varchar(1024) NOT NULL,
`SINDEX12` varchar(1024) NOT NULL,
`SINDEX13` varchar(1024) NOT NULL,
`SINDEX14` varchar(1024) NOT NULL,
`sIndex15` varchar(1022) NOT NULL,
`SINDEX16` varchar(1024) NOT NULL,
`SINDEX17` varchar(1024) NOT NULL,
`SINDEX18` varchar(1024) NOT NULL,
`SINDEX19` varchar(1024) NOT NULL,
`SINDEX20` varchar(1024) NOT NULL,
`NINDEX01` double NOT NULL,
`NINDEX02` double NOT NULL,
`NINDEX03` double NOT NULL,
`NINDEX04` double NOT NULL,
`NINDEX05` double NOT NULL,
`NINDEX06` double NOT NULL,
`NINDEX07` double NOT NULL,
`NINDEX08` double NOT NULL,
`NINDEX09` double NOT NULL,
`NINDEX10` double NOT NULL,
`DINDEX01` datetime NOT NULL,
`DINDEX02` datetime NOT NULL,
`DINDEX03` datetime NOT NULL,
`DINDEX04` datetime NOT NULL,
`DINDEX05` datetime NOT NULL,
`DINDEX06` datetime NOT NULL,
`DINDEX07` datetime NOT NULL,
`DINDEX08` datetime NOT NULL,
`DINDEX09` datetime NOT NULL,
`DINDEX10` datetime NOT NULL,
`FREETEXT` mediumtext NOT NULL,
`UID` int(11) DEFAULT NULL,
PRIMARY KEY (`OID`),
KEY `E_E_OID` (`E_E_OID`),
KEY `sIndex01` (`SINDEX01`),
KEY `sIndex02` (`SINDEX02`),
KEY `sIndex03` (`SINDEX03`),
KEY `sIndex04` (`SINDEX04`),
KEY `sIndex05` (`SINDEX05`),
KEY `sIndex06` (`SINDEX06`),
FULLTEXT `sIndex07` (`SINDEX07`),
KEY `sIndex08` (`SINDEX08`),
KEY `sIndex09` (`SINDEX09`),
KEY `sIndex10` (`SINDEX10`),
KEY `sIndex11` (`SINDEX11`),
KEY `sIndex12` (`SINDEX12`),
KEY `sIndex13` (`SINDEX13`),
KEY `sIndex14` (`SINDEX14`),
KEY `sIndex15` (`SINDEX15`),
KEY `sIndex16` (`SINDEX16`),
KEY `sIndex17` (`SINDEX17`),
KEY `sIndex18` (`SINDEX18`),
KEY `sIndex19` (`SINDEX19`),
KEY `sIndex20` (`SINDEX20`),
KEY `dIndex01` (`DINDEX01`),
KEY `dIndex02` (`DINDEX02`),
KEY `dIndex03` (`DINDEX03`),
KEY `dIndex04` (`DINDEX04`),
KEY `dIndex05` (`DINDEX05`),
KEY `dIndex06` (`DINDEX06`),
KEY `dIndex07` (`DINDEX07`),
KEY `dIndex08` (`DINDEX08`),
KEY `dIndex09` (`DINDEX09`),
KEY `dIndex10` (`DINDEX10`),
KEY `nIndex01` (`NINDEX01`),
KEY `nIndex02` (`NINDEX02`),
KEY `nIndex03` (`NINDEX03`),
KEY `nIndex04` (`NINDEX04`),
KEY `nIndex05` (`NINDEX05`),
KEY `nIndex06` (`NINDEX06`),
KEY `nIndex07` (`NINDEX07`),
KEY `nIndex08` (`NINDEX08`),
KEY `nIndex09` (`NINDEX09`),
KEY `nIndex10` (`NINDEX10`),
KEY `rel_display` (`REL_DISPLAY`),
KEY `App_Parent` (`META_OID`),
) ENGINE=InnoDB AUTO_INCREMENT=1245843 DEFAULT CHARSET=utf8    ROW_FORMAT=COMPRESSED

Takes only 0.6s to complete. I have seen in other questions that the MATCH clause needs to be nested but I'm not sure how to nest it in a COUNT statement. Also when removing the meta_oid clause, the query ran using FULLTEXT index runs a 50% faster than the second query, so whereas it seems FULLTEXT is being a benefit I'm struggling when using it in conjunction with the rest of the query.meta_oid is indexed, sIndex07 is varchar(1024) as well and the database is 4.5Gb in size.

EDIT: The reason why the FULLTEXT search was slower was because the search term has a hyphen in it, thus returning a much larger dataset in my particular case than the LIKE operator. A search with no hyphen does use FULLTEXT and performs about a hundred times better than LIKE

I'll award the bounty in less than 24 hours to the one who can make a search with hyphen works without recompiling mysql binaries, thus making FULLTEXT faster which was the original purpose of the question.

Rafael
  • 1,099
  • 5
  • 23
  • 47
  • Please provide `SHOW CREATE TABLE`; we have no idea what column(s) are indexed by `App_Parent`. – Rick James Nov 22 '16 at 21:49
  • 2
    With a table that big, did you run the query twice? This would avoid caching effects, that often skew timings by a factor of 10. – Rick James Nov 22 '16 at 21:50
  • Question edited – Rafael Nov 23 '16 at 05:15
  • Is that "arrays" of things splayed across 4*10 columns? Can't they be moved to 10 rows * 4 columns in another table? – Rick James Nov 23 '16 at 05:52
  • They represent abstract instances of many different object definitions, I know it looks unusual but that's the way it is – Rafael Nov 23 '16 at 08:31
  • How is you MySQL configured? Have you tweaked the indexing settings? Any specific reason you are using "IN NATURAL LANGUAGE MODE" ? – Mathieu de Lorimier Nov 28 '16 at 13:55
  • Seconding, Rick James comment. Did you run the timing multiple times? Although you are using SQL_NO_CACHE, it is highly likely your OS also caches disk reads which can impact on the results. – Steve E. Nov 28 '16 at 22:39
  • Yes I ran the query multiple times – Rafael Nov 29 '16 at 04:18
  • did you try my answe? – e4c5 Dec 02 '16 at 23:54
  • I can't create an extra table just for this particular case. The issue was that the hyphen treated the `MATCH...AGAINST` search as a two word search and therefore the returned dataset was bigger than the one returned by the `LIKE` operator. What I need to do now is to make the `MATCH` search work the same way as the `LIKE` search does – Rafael Dec 03 '16 at 05:54
  • Well you can't because Full Text Search and Like are very different beasts. Anyway, I think what you have here is an XY problem http://meta.stackexchange.com/questions/66377/what-is-the-xy-problem compounded by the fact that you have not normalized your data. – e4c5 Dec 04 '16 at 01:41
  • 1
    I am surprised that you are able to insert anything into this table at all with so many indexes. – e4c5 Dec 04 '16 at 01:41
  • @e4c5 Yes I can if I make mysql treat the hyphen as a simple character, which is possible. I agree with your last comment, actually only `App_Parent` is actually used so the other indexes are redundant, and impact inserts and updates, however this is a different problem from my `FULLTEXT` search question – Rafael Dec 05 '16 at 09:29
  • yes but the solution to your full text search problem also is to fix your table – e4c5 Dec 05 '16 at 10:05
  • I can't agree with that, even if I don't fix my table `FULLTEXT` is a hundred times faster than using `LIKE` as soon as the hyphen is removed, without fixing my table. They are separated issues – Rafael Dec 05 '16 at 10:15

2 Answers2

1

MySQL uses a query planner to determine how best to resolve queries. Usually, only one index is used to resolve the 'WHERE' component and this is chosen from the list of possible indexes that could apply. The list of possible indexes is shown by EXPLAIN under possible_keys, the chosen index is identified by key. To make the choice, MySQL looks at a number of factors such as the uniqueness of the index to try and determine which index will narrow down the list of possible results best.

Once it has narrowed down the list of rows that match within the index. It will Use Where to read those rows and check them against the remaining conditions in the WHERE clause.

There are many edge cases in this operation and sometimes MySQL can make a poor choice. The query planner was significantly changed in MySQL 5.1 and it took a few releases before it became good again.

Without having your data to examine, it's difficult to suggest why MySQL makes the wrong choice. Although doing:

SELECT SQL_NO_CACHE COUNT(*)
FROM e_entity
WHERE MATCH(sIndex07) AGAINST ("#UPR-1393#" IN NATURAL LANGUAGE MODE);

Would tell you how many rows MySQL is reading from the database using the fulltext index. In your original query, it then has to parse all those rows against 'meta_oid=336799' to determine the final count.

SELECT SQL_NO_CACHE COUNT(*)
FROM e_entity
WHERE meta_oid=336799

Would tell you how many actual rows MySQL is reading using the App_Parent index on META_OID. In your second query it then has to parse those rows against LIKE "%#UPR-1393#%"

If the latter query produces a much lower number than the first, then it would explain why it is quicker when App_Parent instead of the fulltext index.

Steve E.
  • 9,003
  • 6
  • 39
  • 57
  • This doesn't explain why using `FULLTEXT` is slower with an `AND` operator than using `LIKE`. Both `FULLTEXT` and `LIKE` return the same number of rows – Rafael Nov 29 '16 at 08:05
  • @Rafael Diaz, in your question, you say that FULLTEXT query without the `meta_oid` clause is 50% faster. The reason for the speed difference is the way MySQL is resolving the query, not the indexes themselves. For a fair comparison, just do FULLTEXT v LIKE with no other WHERE conditions and it would be interesting to know the results. – Steve E. Nov 29 '16 at 10:38
  • The results are indeed: 50% faster execution for the `FULLTEXT` query when there's no `AND` operator on `meta_oid` – Rafael Nov 30 '16 at 08:54
  • That's what I expected. Try forcing the first query to use the same index as the second. It should be faster and demonstrate that sometimes MySQL makes the wrong choice: `SELECT SQL_NO_CACHE COUNT(*) FROM e_entity FORCE INDEX (App_Parent) WHERE meta_oid=336799 AND MATCH(sIndex07) AGAINST ("#UPR-1393#" IN NATURAL LANGUAGE MODE)` – Steve E. Nov 30 '16 at 10:24
  • I just realized the queries are not even equivalent, they return a different number of rows that's why the disparity in times. I'm not sure how to make a `MATCH AGAINST` behave like a `LIKE` – Rafael Dec 01 '16 at 08:49
0

MATCH(sIndex07) needs a FULLTEXT index to work efficiently. Otherwise it is as bad (or worse?) than LIKE '%string%'.

More

Without FULLTEXT, this should make the LIKE run even faster: INDEX(meta_oid=336799, sIndex07)

Steve E.
  • 9,003
  • 6
  • 39
  • 57
Rick James
  • 135,179
  • 13
  • 127
  • 222