0

I've been working on a small Perl program that works with a table of articles, displaying them to the user if they have not been already read. It has been working nicely and it has been quite speedy, overall. However, this afternoon, the performance has degraded from fast enough that I wasn't worried about optimizing the query to a glacial 3-4 seconds per query. To select articles, I present this query:

SELECT channelitem.ciid, channelitem.cid, name, description, url, creationdate, author
FROM  `channelitem` 
WHERE ciid NOT 
IN (

SELECT ciid
FROM  `uninet_channelitem_read` 
WHERE uid =  '1030'
)
AND (
cid =117
OR cid =308
OR cid =310
)
ORDER BY  `channelitem`.`creationdate` DESC 
LIMIT 0 , 100

The list of possible cid's varies and could be quite a bit more. In any case, I noted that about 2-3 seconds of the total time to make the query is devoted to "ORDER BY." If I remove that, it only takes about a half second to give me the query back. If I drop the subquery, the performance goes back to normal... but the subquery didn't seem to be problematic until just this afternoon, after working fine for a week or so.

Any ideas what could be slowing it down so much? What might I do to try to get the performance back up to snuff? The table being queried has 45,000 rows. The subquery's table has fewer than 3,000 rows at present.

Update: Incidentally, if anyone has suggestions on how to do multiple queries or some other technique that would be more efficient to accomplish what I am trying to do, I am all ears. I'm really puzzled how to solve the problem at this point. Can I somehow apply the order by before the join to make it apply to the real table and not the derived table? Would that be more efficient?

Here is the latest version of the query, derived from suggestions from @Gordon, below

SELECT channelitem.ciid, channelitem.cid, name, description, url, creationdate, author
FROM  `channelitem` 
LEFT JOIN (

SELECT ciid, dateRead
FROM  `uninet_channelitem_read` 
WHERE uid =  '1030'
)alreadyRead ON channelitem.ciid = alreadyRead.ciid
WHERE (
alreadyRead.ciid IS NULL
)
AND  `cid` 
IN ( 6648, 329, 323, 6654, 6647 ) 
ORDER BY  `channelitem`.`creationdate` DESC 
LIMIT 0 , 100

Also, I should mention what my db structure looks like with regards to these two tables -- maybe someone can spot something odd about the structure:

CREATE TABLE IF NOT EXISTS `channelitem` (
  `newsversion` int(11) NOT NULL DEFAULT '0',
  `cid` int(11) NOT NULL DEFAULT '0',
  `ciid` int(11) NOT NULL AUTO_INCREMENT,
  `description` text CHARACTER SET utf8 COLLATE utf8_unicode_ci,
  `url` varchar(222) DEFAULT NULL,
  `creationdate` datetime DEFAULT NULL,
  `urgent` varchar(10) DEFAULT NULL,
  `name` varchar(255) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
  `lastchanged` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
  `author` varchar(255) NOT NULL,
  PRIMARY KEY (`ciid`),
  KEY `newsversion` (`newsversion`),
  KEY `cid` (`cid`),
  KEY `creationdate` (`creationdate`)
) ENGINE=MyISAM  DEFAULT CHARSET=latin1 AUTO_INCREMENT=1638554365 ;

CREATE TABLE IF NOT EXISTS `uninet_channelitem_read` (
  `ciid` int(11) NOT NULL,
  `uid` int(11) NOT NULL,
  `dateRead` datetime NOT NULL,
  PRIMARY KEY (`ciid`,`uid`),
  KEY `ciid` (`ciid`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
Timothy R. Butler
  • 1,097
  • 7
  • 20
  • The query seems to speed up quite a bit if I shift "WHERE uid = '1030'" into a second ON condition for the JOIN. It seems to work -- is this a legitimate strategy or will there be drawbacks I'm not immediately thinking of? – Timothy R. Butler Jul 21 '13 at 03:04

2 Answers2

1

The problem could be that you need to create an index on the channelitem table for the column creationdate. Indexes help a database to run queries faster. Here is a link about MySQL Indexing

Legion
  • 796
  • 7
  • 12
  • Thanks, that brought the query time down to .5 sec., which is indeed much better and is now bearable, although I still suspect slower than it was. I tried added another index to `uninet_channelitem_read` on the `uid` to see if that helped, but that actually slowed things down again to about 1.3 sec. Any other ideas what I might optimize? Thanks for your help! – Timothy R. Butler Jul 11 '13 at 00:04
  • I had a PRIMARY key on the read table that combined the uid and ciid columns, because there should only be one entry per that combination. I tried removing it, which made the query slower, but upon re-adding it, the speed has now improved to .13 sec. Could something have been wrong with the index? – Timothy R. Butler Jul 11 '13 at 00:09
  • I know that primary keys are automatically indexed to improve query time. I can't think of any other issues except maybe if you are catching information about the structure of your tables for your website and the cache is deviant from the actual structure... it might have automatically re-cached with an updated schema. – Legion Jul 11 '13 at 14:47
  • That makes sense. Thanks, @Legion. Could it be I need to some how customize the settings of the index, itself to make it more efficient? This time it grew to about a 6 second query... when I recreated the indexes again, it dropped down to 2 seconds. The half second speed seems to be distant history now. – Timothy R. Butler Jul 21 '13 at 01:00
1

It never hurts to try the left outer join version of such a query:

SELECT ci.ciid, ci.cid, ci.name, ci.description, ci.url, ci.creationdate, ci.author
FROM  `channelitem` ci left outer join
       (SELECT ciid
        FROM  `uninet_channelitem_read` 
        WHERE uid =  '1030'
       ) cr
       on ci.ciid = cr.ciid
where cr.ciid is null and
      ci.cid in (117, 308, 310)
ORDER BY ci.`creationdate` DESC 
LIMIT 0 , 100

This query will be faster with an index on uninet_channelitem_read(ciid) and probably on channelitem(cid, ciid, createddate).

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Thanks for the suggestion, @Gordon! After recreating the indexes while fiddling around, it became hard to test things because the problem was temporarily fixed, but now it is back. This revised query appears to run at about the same speed for me, unfortunately. Referring to the indexes, do you mean a separate index for `cid`, `ciid` and `createddate` on channelitem? – Timothy R. Butler Jul 21 '13 at 00:59
  • @TimothyR.Butler . . . No, I do not mean a separate index for each. I mean one index with the column in that order. – Gordon Linoff Jul 21 '13 at 12:36
  • Thanks, @Gordon! That helps. It is *much* faster -- around .32 sec. I've also noticed if I put my `WHERE uid= '1030'` as part of a JOIN ON (`ON (ci.ciid=cr.ciid AND cr.uid='1030')`) condition with a simpler join (`JOIN 'uninet_channelitem_read' cr`) instead of one with a subquery, that reduces it further to .02 sec. Is that a legitimate optimization or am I missing some long term problem there? Thanks again! – Timothy R. Butler Jul 21 '13 at 21:31
  • @TimothyR.Butler . . . That is a legitimate optimization. For performance reasons, I should not have left the subquery in the final result. – Gordon Linoff Jul 21 '13 at 21:42
  • Thanks again, @Gordon. It is interesting, the original query (with the subquery) seems at least at times faster than the join with subquery, but when the JOIN has the condition moved into the ON statement, it is incredibly fast. Even faster with your suggested index changes. I really appreciate your help! – Timothy R. Butler Jul 21 '13 at 22:28