5

I have 2 myISAM tables, called 'tests' and 'completed_tests', one with 170 entries and the other with 118k entries. When I run this query:

SELECT ct.archive, ct.status, ct.score, ct.users_LOGIN, t.lessons_ID, t.content_ID, t.keep_best 
 FROM completed_tests ct,tests t 
WHERE ct.status != 'deleted' and ct.status != 'incomplete' and t.id=ct.tests_ID and t.lessons_ID=10;

Then it takes around 30 second to accomplish. Subsequent calls to the same query, or related queries (different lessons_ID for example), are much faster. They remain faster even if I reset the query cache or restart the mysql server. I suppose this means that the tables are cached into memory (and stay there). My problem is that this specific query seems to be causing problems on high traffic sites that run this application (I guess because the server is slow on memory and emptying its cache?). My questions are:

  • Is there a way to consistently replicate the 30'' delay on my system, so I can try optimizing the query? Should I, for example, empty my system's cache?
  • Is there a way to optimize the query above? Running a explain gives:

Running explain gives:

mysql> explain SELECT ct.archive, ct.status, ct.score, ct.users_LOGIN, t.lessons_ID, t.content_ID, t.keep_best FROM completed_tests ct,tests t WHERE ct.status != 'deleted' and ct.status != 'incomplete' and t.id=ct.tests_ID and t.lessons_ID=10;
+----+-------------+-------+------+-----------------+----------+---------+---------------+------+-------------+
| id | select_type | table | type | possible_keys   | key      | key_len | ref           | rows | Extra       |
+----+-------------+-------+------+-----------------+----------+---------+---------------+------+-------------+
|  1 | SIMPLE      | t     | ref  | PRIMARY,idx1    | idx1     | 3       | const         |    4 |             |
|  1 | SIMPLE      | ct    | ref  | tests_ID,status | tests_ID | 3       | firstcho.t.id | 1025 | Using where |
+----+-------------+-------+------+-----------------+----------+---------+---------------+------+-------------+

Which, to my understanding, indicates that indexing is used successfully. Thanks to all.

Table structure

>show create table 'tests';
CREATE TABLE `tests` (
  `id` mediumint(8) unsigned NOT NULL AUTO_INCREMENT,
  `active` tinyint(1) NOT NULL DEFAULT '1',
  `content_ID` mediumint(8) unsigned NOT NULL DEFAULT '0',
  `lessons_ID` mediumint(8) unsigned NOT NULL DEFAULT '0',
  `name` varchar(255) NOT NULL DEFAULT '',
  `mastery_score` tinyint(4) unsigned NOT NULL DEFAULT '0',
  `description` text,
  `options` text,
  `publish` tinyint(1) DEFAULT '1',
  `keep_best` tinyint(1) DEFAULT '0',
  PRIMARY KEY (`id`),
  KEY `idx1` (`lessons_ID`)
) ENGINE=MyISAM AUTO_INCREMENT=171 DEFAULT CHARSET=utf8

>show create table completed_tests;
CREATE TABLE `completed_tests` (
  `id` mediumint(8) unsigned NOT NULL AUTO_INCREMENT,
  `users_LOGIN` varchar(100) DEFAULT NULL,
  `tests_ID` mediumint(8) unsigned NOT NULL DEFAULT '0',
  `test` longblob,
  `status` varchar(255) DEFAULT NULL,
  `timestamp` int(10) unsigned NOT NULL DEFAULT '0',
  `archive` tinyint(1) NOT NULL DEFAULT '0',
  `time_start` int(10) unsigned DEFAULT NULL,
  `time_end` int(10) unsigned DEFAULT NULL,
  `time_spent` int(10) unsigned DEFAULT NULL,
  `score` float DEFAULT NULL,
  `pending` tinyint(1) NOT NULL DEFAULT '0',
  PRIMARY KEY (`id`),
  KEY `users_login` (`users_LOGIN`),
  KEY `tests_ID` (`tests_ID`),
  KEY `status` (`status`),
  KEY `timestamp` (`timestamp`),
  KEY `archive` (`archive`),
  KEY `score` (`score`),
  KEY `pending` (`pending`)
) ENGINE=MyISAM AUTO_INCREMENT=117996 DEFAULT CHARSET=utf8
periklis
  • 10,102
  • 6
  • 60
  • 68
  • No, the explain says that this query **does not use indexes**. If it would do, you would see "Using index" in the `Extra` column. – Kaii Feb 23 '12 at 08:53
  • Please use `SHOW CREATE TABLE ...` for "tests" and "completed_tests" so we can see how your indexes are structured. – Kaii Feb 23 '12 at 08:53
  • Table schemas of tests and completed_tests including definitions of all indexes would be helpful in adjusting my answer more clearly – Simon at The Access Group Feb 23 '12 at 08:55
  • I have a link on the top of the post to the schemas, here it is again: http://pastebin.com/tV06HaUQ – periklis Feb 23 '12 at 08:57

4 Answers4

2

Regarding the other users on high traffic sites that run this and have problems, it is entirely possible that they have database configuration issues outside of your control. That aside, here's some recommendations that would help.

Improving performance

The following assumes a 1:n relationship from tests:completed_tests based on t.id:ct.tests_id where there will always be a row in tests present for n number of rows in completed_tests.

The following additional index is advisable to aid the join

CREATE INDEX `ct_to_tests` ON completed_tests (tests_id,status);

Furthermore, if you are able to change ct.status to be ENUM('deleted','status',..... any other possibilieis ....) (assuming a limited number of statuses available, which is entirely viable) then that too will increase performance as it'll remove the only text search.

The reason I suggest ENUM is simple. status looks to be a field with definition VARCHAR(255) which is populated programmatically and as such will have a limited number of discrete values. If you are able to change the VARCHAR into an ENUM then MySQL will be able to treat it as if it's a numeric field. This is becuase behind the scenes each string in an ENUM is given a numeric index, and it's that index which is used when matching on an ENUM instead of a full string when using VARCHAR, which in turn is far more efficient.

SELECT
    t.lessons_ID, 
    t.content_ID, 
    t.keep_best,
    ct.archive, 
    ct.status, 
    ct.score, 
    ct.users_LOGIN

FROM tests t

INNER JOIN completed_tests ct
    ON ct.status NOT IN ('deleted,'status')
    AND ct.tests_id = t.id

WHERE t.lessons_ID = 10
  • running with SQL_NO_CACHE does not make any difference. Actually, I reset the cache with reset query cache; as I mentioned in my post and it still doesn't make any difference. I will try your suggestion about the query and post back here – periklis Feb 23 '12 at 09:00
  • I'm afraid I got the exact same results: 30 seconds for the first time the query runs, less than one for subsequent runs of the same or similar query. I also completely eliminated the query cache just to make sure. Running explain on the query now gives the same results, only that it lists the 2 indexes you suggested on the "possible keys" column – periklis Feb 23 '12 at 09:16
  • I am running it from the console. You'll see a link with the definitions of the tables on the top of my post, but here it is for convenience: http://pastebin.com/tV06HaUQ – periklis Feb 23 '12 at 09:20
  • You don't have the second key I suggested listed, containing (`tests_id`,`status`) - is this just because it's not been updated? Also if you are able to restrict the length of `status` to be a shorter fixed length field i.e. `CHAR(10)` or better yet change it to an `ENUM` as suggested then that would help. `status` to me indicates hard-coded discrete states so `ENUM` may be viable. `VARCHAR` takes a lot more to search through than a numeric index (which `ENUM` is considered to be, as the internal index is used instead of the text value) – Simon at The Access Group Feb 23 '12 at 09:29
  • yes, they're not updated, that's why your keys are not listed. I'll try setting the status to enum and let you know – periklis Feb 23 '12 at 09:31
0

Finally I resorted to splitting the table to 2, moving the blob object to the second table and joining wherever needed. Unfortunately that involved changing many lines of code.

periklis
  • 10,102
  • 6
  • 60
  • 68
0

try with LEFT JOIN

SELECT ct.archive, ct.status, ct.score, ct.users_LOGIN, 
       t.lessons_ID, t.content_ID, t.keep_best 
FROM completed_tests ct
LEFT JOIN tests t ON (t.id=ct.tests_ID)
WHERE t.lessons_ID=10 AND  ct.status != 'deleted' AND ct.status != 'incomplete' ;
xkeshav
  • 53,360
  • 44
  • 177
  • 245
  • @Kaii no, it's not. It's called theta style and it's equivalent to `INNER JOIN`. `LEFT JOIN` and `INNER JOIN` are quite different, therefore, what diEcho is saying is an improvement from performance point of view and readability. – N.B. Feb 23 '12 at 10:17
  • @N.B. thx for the correction. You're right, using `LEFT` instead of `INNER` join is an improvement. removed my pointless comment. – Kaii Feb 23 '12 at 10:23
0

I would start by trying to find out exactly what the problem is.

Do you have phpmyadmin on the server where the problem occurs? If so, you could submit the query to the "sql" tab with the "profiling" checkbox checked. You'll get a verbose trace of the entire query execution, and maybe that will give you some insight into what you need to solve or how to replicate the problem. Also, try adding indexes to all the columns that you mention in your where clause, if you haven't already done this.

alexg
  • 3,015
  • 3
  • 23
  • 36