2

I recently took a simple skills test to which I was given the feedback:

"There is one small indexing optimisation which could improve performance."

The skills test involved creating a birthday e-card online app; users sign up, then on their birthday an email is sent to them. I was to presume this is on a Linux server running a mysql database with around 4 million records.

I've tried my best to research further issues with indexing on my database, but to my best research and knowledge, im struggling to find any improvements. I'd really appreciate any pointers here so I can learn where I went wrong;

Database:

CREATE TABLE `birthdayCard`
(
   `Email` VARCHAR(255), 
   `FirstName` CHAR(30), 
   `LastName` CHAR(30), 
   `Dob` DATE, 
   PRIMARY KEY (Email), 
   INDEX(Dob)
 );

Query:

SELECT * FROM `birthdayCard` 
WHERE MONTH(Dob) = MONTH(NOW()) 
AND DAY(Dob) = DAY(NOW());
Prix
  • 19,417
  • 15
  • 73
  • 132
kirgy
  • 1,567
  • 6
  • 23
  • 39

4 Answers4

2

As explained in the comment above, the INDEX(Dob) is not used -- since this is an index on year-month-day. You have to create an index on month-day.

Probably not the most elegant solution, but:

CREATE TABLE `birthdayCard`(`Email` VARCHAR(255), `FirstName` CHAR(30), `LastName` CHAR(30),
                            `Mob` int, `Dob` int, 
                            PRIMARY KEY (Email), INDEX(`Mob`, `Dob`));

See http://sqlfiddle.com/#!2/db82ff/1


For a better( ?) answer: as MySQL does not support computed columns, you might need triggers to populate a "month-day" columns, and have an index on it:

CREATE TABLE `birthdayCard`(`Email` VARCHAR(255), `FirstName` CHAR(30), `LastName` CHAR(30),
                            `Dob` DATE,
                            `Birthday` CHAR(5),
                            PRIMARY KEY (Email), INDEX(`Birthday`));

CREATE TRIGGER ins_bithdayCard BEFORE INSERT ON `birthdayCard`
FOR EACH ROW
    SET NEW.`birthday` = DATE_FORMAT(NEW.`Dob`, "%m%d");

CREATE TRIGGER upd_bithdayCard BEFORE UPDATE ON `birthdayCard`
FOR EACH ROW
    SET NEW.`birthday` = DATE_FORMAT(NEW.`Dob`, "%m%d");

This allow "simple" inserts, preserving if needed the full Dob as in your original example:

insert into birthdayCard (Email, FirstName, LastNAme, Dob) 
   values ("x@y.com", "Sylvain", "Leroux", '2013-08-05');

The SELECT query has to be modified to use the new "search" column:

SELECT * FROM `birthdayCard` WHERE Birthday = DATE_FORMAT(NOW(), "%m%d");

Sett http://sqlfiddle.com/#!2/66111/3

Sylvain Leroux
  • 50,096
  • 7
  • 103
  • 125
  • You should also include a Yob column and/or the complete birthday. – Arjan Aug 05 '13 at 07:00
  • 1
    @Arjan I update my answer to keep the original full `Dob` -- using now a trigger to duplicate *month-day* in a separate column to speed-up search. – Sylvain Leroux Aug 05 '13 at 08:06
0

I don't know about a "small" improvement, but I can think of a big one...

The index can only be used on "naked" fields, so your current query causes an expensive full table scan. You should transform the WHERE expression so the field is not enclosed by the function call:

SELECT * FROM `birthdayCard` 
WHERE
    Dob >= CURDATE()
    AND Dob < DATE_ADD(CURDATE(), INTERVAL 1 DAY);

Which can be satisfied by an index range scan:

ID  SELECT_TYPE     TABLE           TYPE    POSSIBLE_KEYS   KEY     KEY_LEN     REF     ROWS    EXTRA
1   SIMPLE          birthdayCard    range   Dob             Dob     4           (null)  1       Using where
Branko Dimitrijevic
  • 50,809
  • 10
  • 93
  • 167
  • How would it work? If a Dob is the birthday date, then it'll be < CURDATE() in most cases. – meze Aug 05 '13 at 07:39
  • 1
    @meze Sorry, I apparently misunderstood your original query. I thought you wanted to get people born today (this year), while in fact you wanted people born on the same date of _any year_. If that is indeed what you want, you'll need to do something along the lines of what [Sylvain Leroux proposed](http://stackoverflow.com/a/18047511/533120). Technically your table violates the principle of [atomicity](http://en.wikipedia.org/wiki/First_normal_form#Atomicity) (and the 1NF), since you are querying for the part of the date, yet storing it as a whole, so the solution is to normalize the table. – Branko Dimitrijevic Aug 05 '13 at 07:55
  • It's not my query btw, so I'm not sure what the OP meant. But I wouldn't say it's not in 1NF in either case. Because from the domain point of view the birthday date is a type that can be used to calculate things like age, m/d or a year of birth. That's purely optimisation and clever DB engines support computed indexes. – meze Aug 05 '13 at 08:44
  • @meze Sorry for confusing you with the OP. As I'm sure you already know, what "atomic" means is not strictly defined. I find it useful to look at how I need to query and change the data and define "atomic" relative to that, which can in some cases depend on the DBMS involved, as you already noted. This is an example how physical and logical design cannot be completely separated. – Branko Dimitrijevic Aug 05 '13 at 09:12
0

I managed to receive some feedback from my test directly from the company, and as their response hasn't been shared so far, I thought I'd share that too as an option.

The problem comes, as highlighted by most, with the DOB. From what was explained to me, when querying a DOB stored as a date as I have done, a query looking for the day and month performs the query similar to a LIKE statement.

This effectually means the stored value 1970-01-01 (the format dates are stored) would be queried similar to:

WHERE Dob LIKE '%01-01' 

This would mean the MYSQL engine would cycle through the unneeded "1970-" part of the value.

The proposed solution would then be to only store (and index) the part of the date needed (month, day). A 4 character length integer would be perfect for this, especially if we performed a query which would select from the left using LEFT, and the RIGHT SELECT function.

Table:

CREATE TABLE `birthdayCard`
(
   `Email` VARCHAR(255), 
   `FirstName` CHAR(30), 
   `LastName` CHAR(30), 
   `Dob` INT(4), 
   PRIMARY KEY (Email), 
   INDEX(Dob)
 );

Query:

SELECT * FROM `birthdayCard` 
WHERE LEFT(Dob, 2) = MONTH(NOW()) 
AND RIGHT(Dob, 2) = DAY(NOW());

It's not that the other methods won't work, or my example was wrong, but speed wise - this proposed method seems to me at least, to be the fastest. In case you're interested; this solution was provided by a hardy SQL veteran and CEO with some 20 programming years behind him.

kirgy
  • 1,567
  • 6
  • 23
  • 39
-1

A few options I would consider.

Creating a column and selecting date_diff(dob, interval - year(dob) YEARS) This gives a date of 0000-08-04 which you can query easily. You can use an trigger to keep new columns in sync.

instead of using a date type. use a char(10). When the column try has been changed update the column to be REVERSE(dob). You can then query for the day and month pretty quick while keeping it in 1 column and keeping the year. This has the advantage of keeping 1 column and all the information

Using some maths - though no method are springing to mind. Im sure there are some

exussum
  • 18,275
  • 8
  • 32
  • 65
  • Converting a date to a char type and then reversing it to do searches on is going to push up the O-notation of inserts and updates. Your first solution is much more elegant. – Namphibian Aug 05 '13 at 00:18
  • inserts and updates happen much less often than selects on most cases. – exussum Aug 05 '13 at 06:10