1

I'm using MySQL/InnoDb. I have this table News which has about 3,000,000 records and I hope it gets 7,000,000 more. Currently I use the following query to fetch records from News table with exact phrase inside their title or content:

SELECT * FROM News WHERE MATCH (title, content) AGAINST ('"My Phrase"' in Boolean Mode)

But the results aren't satisfying. For me, when searching for an exact phrase, it's logical to see the latest news with that phrase first. I don't need the results to be sorted by relevance. I don't mind if the first record found has 1 occurrence of my phrase and the second one has 10. All I care is ORDER BY time_added or ORDER BY id. So I tried:

SELECT * FROM News WHERE MATCH (title, content) AGAINST ('"My Phrase"' in Boolean Mode) ORDER BY id DESC limit 40

But when I try to EXPLAIN this query, I see that it is Using filesort as I excpected.

Is it the only way MySQL works? Can I get the results sorted by time in an optimal way?

Edit:

MySQL version: 5.6.34

News table structure is as follows:

CREATE TABLE `News` (
 `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
 `title` varchar(400) COLLATE utf8_bin NOT NULL,
 `url_title` varchar(80) COLLATE utf8_bin NOT NULL,
 `link` varchar(2000) COLLATE utf8_bin NOT NULL,
 `link_unique_index` varchar(255) COLLATE utf8_bin DEFAULT NULL,
 `source` smallint(5) unsigned NOT NULL COMMENT,
 `category` smallint(5) unsigned NOT NULL DEFAULT '0',
 `description` varchar(600) COLLATE utf8_bin NOT NULL,
 `has_content` tinyint(1) NOT NULL DEFAULT '0',
 `content` text COLLATE utf8_bin NOT NULL,
 `has_image` tinyint(1) NOT NULL,
 `image` varchar(14) COLLATE utf8_bin NOT NULL,
 `image_orientation` char(1) COLLATE utf8_bin NOT NULL,
 `original_image` varchar(2000) COLLATE utf8_bin NOT NULL,
 `keywords` varchar(300) COLLATE utf8_bin NOT NULL,
 `year_added` smallint(4) NOT NULL,
 `date_added` date NOT NULL,
 `time_added` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
 `time_updated` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
 `number_of_views` int(11) NOT NULL DEFAULT '0',
 `last_view_time` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
 `number_of_positives` int(11) NOT NULL DEFAULT '0',
 `number_of_negatives` int(11) NOT NULL DEFAULT '0',
 `number_of_votes` int(11) NOT NULL DEFAULT '0',
 `number_of_suggestions` int(11) NOT NULL DEFAULT '0',
 `last_suggestion_time` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
 PRIMARY KEY (`id`),
 UNIQUE KEY `lui` (`link_unique_index`) USING BTREE,
 KEY `st` (`source`,`time_added`),
 KEY `snv` (`source`,`number_of_views`),
 KEY `snvo` (`source`,`number_of_votes`),
 KEY `sct` (`source`,`category`,`time_added`),
 KEY `scnv` (`source`,`category`,`number_of_views`),
 KEY `scnvo` (`source`,`category`,`number_of_votes`),
 KEY `ct` (`category`,`time_added`),
 KEY `cnv` (`category`,`number_of_views`),
 KEY `cnvo` (`category`,`number_of_votes`),
 KEY `hicdnv` (`has_image`,`category`,`date_added`,`number_of_views`),
 KEY `sdnv` (`source`,`date_added`,`number_of_views`),
 KEY `sdnvo` (`source`,`date_added`,`number_of_votes`),
 KEY `scdnv` (`source`,`category`,`date_added`,`number_of_views`),
 KEY `scdnvo` (`source`,`category`,`date_added`,`number_of_votes`),
 KEY `cdnv` (`category`,`date_added`,`number_of_views`),
 KEY `cdnvo` (`category`,`date_added`,`number_of_votes`),
 KEY `dnv` (`date_added`,`number_of_views`),
 KEY `dnvo` (`date_added`,`number_of_votes`),
 KEY `clst` (`category`,`last_suggestion_time`) USING BTREE,
 KEY `slst` (`source`,`last_suggestion_time`) USING BTREE,
 KEY `nv` (`number_of_views`) USING BTREE,
 KEY `nvo` (`number_of_votes`) USING BTREE,
 KEY `t` (`time_added`) USING BTREE,
 KEY `lvt` (`last_view_time`) USING BTREE,
 FULLTEXT KEY `title_content` (`title`,`content`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin ROW_FORMAT=COMPACT
mrmowji
  • 934
  • 8
  • 29

1 Answers1

1

Try this, using a tip I found from Rolando on DBA.stackexchange:

SELECT News.* FROM (
SELECT id FROM News
WHERE MATCH (title, content) AGAINST ('"My Phrase"' in Boolean Mode) 
) matches
INNER JOIN News
ON News.id = matches.id
ORDER BY News.id DESC
limit 40
Willem Renzema
  • 5,177
  • 1
  • 17
  • 24
  • If the goal was to avoid "filesort", I suspect this fails. It probably also says "using temporary". Let's see both `EXPLAINs`. OTOH, what this query provides is looking at the `*` only 40 times, not "as many as needed by the `MATCH`. So it is likely to be faster. – Rick James Nov 27 '17 at 21:38
  • As @RickJames said, it's `Using temporary; Using filesort`. I don't know if it's as fast as I can get. It's a great method for small tables, but I don't recommend it for large tables. It's obvious that for most frequent used phrases, it gets slower. One of my test cases with such phrase took 300 seconds before I killed the process. I would vote up your answer but I can't because of my reputation. – mrmowji Nov 28 '17 at 11:09
  • @Mowji There are a few other tips linked from that post on dba.stackexchange.com that I linked. I suggest going through them and giving them a shot. In particular, see if switching to just using normal indexes and LIKE improve performance. At the end of the Using temporary; Using filesort doesn't necessarily mean the query is going to perform slowly, so focus more on actual performance than simply getting rid of that from the explain plan. – Willem Renzema Nov 28 '17 at 19:06
  • @Mowji - 300 seconds -- was that with under 40 rows matching? Or over? – Rick James Nov 29 '17 at 00:32
  • @WillemRenzema I will test other methods and I will let you know. – mrmowji Nov 29 '17 at 16:29
  • @RickJames I limited the query for 40 rows and I got 300 seconds (at least). – mrmowji Nov 29 '17 at 16:31
  • @Mowji - I am not referring to `LIMIT 40`, but rather how many rows actually match. `LIMIT` may _or may not_ be optimized such that it looks only at 40 rows. Continued... – Rick James Nov 29 '17 at 17:20
  • Since there are more than 40, Willem's solution brings efficiency to the table. – Rick James Nov 29 '17 at 17:21
  • @RickJames Sorry, I misunderstood you before. I didn't get the results but I expected to have about 1 million records with that match at least. – mrmowji Nov 29 '17 at 17:29
  • A million records: I _think_ it must fetch the million _full_ (`*`) records, sort them, then deliver 40. With Willem's answer, it fetches a million `ids` (much less bulky), sort them, pick 40, then get _only_ 40 full records. More steps, but less stuff to shovel around. – Rick James Nov 29 '17 at 17:44
  • @RickJames Yes, Willem's answer works with minimum requirements but still suffers from those MORE STEPS ('Using temporary and filesort'). – mrmowji Nov 29 '17 at 19:13