2

Given this situation, I have two tables with the following existing indices:

CREATE TABLE tblBusses 
(
    intID BIGINT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
    strBus VARCHAR(100) NOT NULL DEFAULT ""
);
CREATE INDEX indBussesBus ON tblBusses (strBus );
CREATE INDEX indBussesID ON tblBusses (intID);

CREATE TABLE tblPersons
(
    intID BIGINT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
    strPerson VARCHAR(100) NOT NULL DEFAULT "",
    intBusID BIGINT UNSIGNED NOT NULL,
    intSexID BIGINT UNSIGNED NOT NULL,
    datCreated TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE INDEX indPersonsPerson ON tblPersons (strPerson );
CREATE INDEX indPersonsSex ON tblPersons (intBusID);

...And tblPersons(intBusID) is a foreign key that refers to tblBusses(intID), and tblPersons(intSexID) is a foreign key to a different, obvious table. In this case, tblPersons contains 50 Million+ records, and tblBusses contains approx 1 million records.

The persons table is being populated in steps (with a cronjob), and I need to get the next 100 busses from the tblBusses that are not yet referred to in the tblPersons with a selected sex, to perform a number of populating (insert) queries. Considering the numbers, this is a little challenge. The populating queries are fast enough, but the pre- bus selection queries are a challenge and too slow.

Currently, to select the next 100 busses, I maintain an assumption that I populate tblPersons with increasing Bus IDs, so I search for the MAX(tblPersons.intBusID), then I get the next 100 tblBusses records:

SELECT max(intBusID) AS intMaxBus 
FROM tblPersons 
WHERE intSexID={$intSexID};

SELECT intID, strBus 
FROM tblBusses 
WHERE intID>{$intMaxBus} 
ORDER BY intID ASC 
LIMIT 100;

The second query takes 1900 microseconds (fast enough). The first takes 2.5 seconds, which is too slow. Certainly for the foreseeable future.

How can I make this faster? I don't care if someone comes up with a better way of resolving my entire problem (we're all learners), or an improved solution to the MAX() query, 2.5 seconds is too slow.

One solution that I can imagine, is for mysql to somehow continuously cache the max(intBusID) position. I would have thought this happened anyway with the index, but apparently with 2.5 seconds, that is not working well enough. Any hints?

DoNotArrestMe
  • 1,285
  • 1
  • 9
  • 20
Florian Mertens
  • 2,418
  • 4
  • 27
  • 37
  • Also, on an irrelevant note, creating an index on tblPersons(intSexID) slows the 'first query' down twice, this I tried out (in a belief that the slow speed was also caused by some effect of the WHERE part. – Florian Mertens Feb 03 '14 at 03:38
  • Can you print the result of running EXPLAIN for both queries? – omgitsfletch Feb 03 '14 at 03:38

2 Answers2

4

Try this:

SELECT intBusID AS intMaxBus
FROM tblPersons
WHERE intSexID={$intSexID}
ORDER BY intBusID DESC
LIMIT 1

I think this should use the index on intBusID to scan the rows until it finds one whose sex matches, and return it immediately. MySQL wasn't able to figure out that your query is equivalent, so it created a temporary table of all the rows whose sex matches and then found its max ID.

Barmar
  • 741,623
  • 53
  • 500
  • 612
  • How the proposed query can be better (assuming there are no indexes)? – zerkms Feb 03 '14 at 03:44
  • When you use `ORDER BY` and `LIMIT` on an indexed column, MySQL uses the index to scan the rows in that order, and stops scanning when it reaches the limit. – Barmar Feb 03 '14 at 03:46
  • when you use `MAX` + `WHERE` on an indexed columns (`(intSexID, intBusID)` in this case) it would do exactly the same – zerkms Feb 03 '14 at 03:48
  • @zerkms yes, there is an index on tblPersons.intBusID. Barmar, You mean ORDER BY intBusID DESC ? IF yes, your query was less than 0.01 seconds !! I mean wow! – Florian Mertens Feb 03 '14 at 03:49
  • @zerkms But he doesn't have a composite index on both columns. – Barmar Feb 03 '14 at 03:50
  • @Barmar: if he doesn't - both queries (original and your) would still behave in the same way. To sort the dataset it needs the whole dataset to be retrieved – zerkms Feb 03 '14 at 03:52
  • Obviously not, he just said that my query is faster. Logically they're equivalent, but obviously mysql doesn't detect this and optimizes them differently. – Barmar Feb 03 '14 at 03:53
  • @Barmar: "faster" isn't descriptive. There are multiple places where OS and mysql can use caching to improve performance. So without the execution plan I wouldn't accept that fact :-) – zerkms Feb 03 '14 at 03:54
  • he said the original query takes 2.5 seconds, my query takes 0.01 seconds. That's an enormous difference. – Barmar Feb 03 '14 at 03:55
  • I don't believe it :-) – zerkms Feb 03 '14 at 03:56
  • @zerkms Me neither. But its on my screen atm lol. Just as yourself, I also think that my screen is lying lolll – Florian Mertens Feb 03 '14 at 04:17
  • Does `EXPLAIN` show the reason for the improvement? – Barmar Feb 03 '14 at 04:19
  • I suspect that this query will still be slow if there is no record matching the conditional `intSexID={$intSexID}`, or if the matched record has a low value of `intBusID` – vidang Feb 03 '14 at 04:25
  • I suspect you're right. My optimization assumes that the sexes are well distributed through the data, so in practice it won't be a problem. – Barmar Feb 03 '14 at 04:31
  • I ended up using the other answer partly because there was also the mentioning of composite queries, which I had not yet thought of. I did like the simplicity of this answer, and how I overlooked that. Thank you! – Florian Mertens Feb 08 '14 at 11:08
  • Did you do performance tests or use `EXPLAIN` to see which one will perform better? My answer can be used as the subquery in your composite query just as well, I thought you were only looking for this part of the answer. – Barmar Feb 08 '14 at 18:36
4

This is the max() query:

SELECT max(intBusID) AS intMaxBus 
FROM tblPersons 
WHERE intSexID={$intSexID}

You need an index on tblPersons(intSexID, intBusID);

You can combine the queries as:

SELECT intID, strBus 
FROM tblBusses 
WHERE intID > (SELECT coalesce(max(intBusID) , 0) AS intMaxBus 
               FROM tblPersons 
               WHERE intSexID = {$intSexID}
              )
ORDER BY intID ASC 
LIMIT 100;

This query will benefit from an index on tblBusses(intID, strBus).

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • ..So composite indices is the answer then? I haven't seen this before, I am looking this up, but the combined query, with the suggested composite index on tblPersons, and without the suggested composite index on tblBusses, runs under 0.05 seconds. That is impressive ! – Florian Mertens Feb 03 '14 at 04:16
  • I ended up using this answer. Note that I had to add a IFNULL(... , 0) around your inner SELECT query in case there are no records at start. – Florian Mertens Feb 08 '14 at 11:07