0

I inherited the codebase for a custom CMS built with MySQL and PHP which uses fulltext indexes to search in content (text) fields. When analyzing the database structure I found that all relevant tables were created in the following fashion (simplified example):

CREATE TABLE `stories` (
  `story_id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `headline` varchar(255) NOT NULL DEFAULT '',
  `subhead` varchar(255) DEFAULT NULL,
  `content` text NOT NULL,
  PRIMARY KEY (`story_id`),
  FULLTEXT KEY `fulltext_search` (`headline`,`subhead`,`content`),
  FULLTEXT KEY `headline` (`headline`),
  FULLTEXT KEY `subhead` (`subhead`),
  FULLTEXT KEY `content` (`content`)  
) ENGINE=MyISAM;

As you can see, the fulltext index is created in the usual way but then each column is added individually as well, which I believe creates two different indexes.

I've contacted the prior developer and he says that this is the "proper" way to create fulltext indexes, but according to every single example I've found in the Internet, there's no such requirement and this would be enough:

CREATE TABLE `stories` (
  `story_id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `headline` varchar(255) NOT NULL DEFAULT '',
  `subhead` varchar(255) DEFAULT NULL,
  `content` text NOT NULL,
  PRIMARY KEY (`story_id`),
  FULLTEXT KEY `fulltext_search` (`headline`,`subhead`,`content`)
) ENGINE=MyISAM;

The table has over 80,000 rows and is becoming increasingly hard to manage (the full database is near to 10GB) so I'd like to get rid of any unnecessary data.

Many thanks in advance.

Proteo
  • 11
  • 4
  • how do you query this or these indexes that is directly relational to the answer. Do you for example have queries that only search the content alone? – Steve Mar 08 '13 at 15:59
  • Actually yes, the search facility allows the user to define what fields will be used to perform the search, but according to my knowledge (and the relevant PHP code in the CMS) that's handled with the MATCH() statement, for example: SELECT * FROM stories WHERE MATCH(content).. – Proteo Mar 08 '13 at 16:07

1 Answers1

0

The way to figure it out for yourself is to use EXPLAIN with the queries (matches) to see what indexes are actually used. If you have a query that doesn't use an index and is slow, make an index (or tell it manually to USE an index_hint), then try the EXPLAIN again to see if the index gets used.

I would expect that if your users are allowed to specify just one column to search on, and that column isn't first or the only one in the list of indexed columns, the query/match would use a non-indexed sequential search. In other words, with your index on (headline,subhead,content) I would expect the index to be used for any search with all three columns, or with just the headline, or with headline and subhead, but not for just subhead, and not for just content. I haven't done it in a while, so something might be different nowadays; but EXPLAIN should reveal what is going on.

If you examine all the possible queries with EXPLAIN and find that an index isn't used by any of them, you don't need it.

Terry
  • 1