0

I would like to find what is better approach, primary in matter of performance in my databse. Wildcard LIKE, or a range search?

Lets have this table:

| id INT | created DATE | some other columns ... |

If I want to select entries that was created in some month, I can think out two options. One is a pair of comparisons to search in a range:

SELECT * FROM my_table WHERE created >= '2014-09-01' AND created < '2014-10-1'

The second is a wildcard LIKE:

... WHERE created LIKE '2014-09-%'

I have created a dummy table with few hundreds entries, divided between three months. When I run a select with both conditions (with disabled caching and index created on the column), the time was roughly the same - once was faster the first one, another time the second one. The times were between 4.0 and 5.0 ms on an old Atom CPU.

It seems to me that from performance view, it is not much important which one I will use. Is it correct? Or there will appears differences with many thousands of rows?

Thanks

Zopper
  • 132
  • 9
  • 1
    @Giles wrapping the column `created` in a function will render the index unusable. – Arth Sep 18 '14 at 15:01
  • Using YEAR() and MONTH() takes just half of the time of my two options. So this seems to be the best option so far. :-) – Zopper Sep 18 '14 at 15:02
  • 1
    You need more rows. @Giles, nevertheless I wouldn't suggest an alternative that is worse than one of the previous suggestions, i.e. a range scan. – Arth Sep 18 '14 at 15:09
  • I will look more on the date related functions and operations and test it with more rows, then. Tens of thousands will be enough? And thank you all for the answers. – Zopper Sep 18 '14 at 15:21
  • I have tested both methods on my system (~7 million rows) and they can both use the index. – Arth Sep 18 '14 at 15:31

2 Answers2

3

A wildcard prefix pretty much renders it impossible for the optimizer to use an index, and I would guess that performing a range query (especially on date types) is always better than a string comparison.

Plus a few hundered rows is not going to that indicative: differences in data caching, CPU being used by other processes will more than account for a diffreence of ms.

EDIT: but to reiterate, the main thing here is compare dates with dates, not dates with strings. That will come back to bite you at some stage.

e.g.

where created >= STR_TO_DATE('2014-09-01','%Y-%m-%d') 
  and created < STR_TO_DATE('2014-10-01','%Y-%m-%d')
davek
  • 22,499
  • 9
  • 75
  • 95
  • @Giles: not if the column is actually a date type: then an index scan can be used. – davek Sep 18 '14 at 15:01
  • @Giles, no it won't. I think the prefix % and the second postfix % are typos in the question. – Arth Sep 18 '14 at 15:01
  • @Arth: Yes, I fixed it in the question. I really use just suffix wildcard. Copy&Paste error. – Zopper Sep 18 '14 at 15:06
  • @Giles: the index might not be used for other reasons e.g. table is small enough to do a table scan. – davek Sep 18 '14 at 15:08
  • @Giles, the comparison here is a DATE comparison, the string is implicitly converted to a DATE before the comparison is made, and as the column `created` is undoctored on the other side of the equals, the index can be used. – Arth Sep 18 '14 at 15:14
  • @davek: I get a date in a string or numbers going into the database from outside, so there has to be a some string->date conversion. I can do it explicitly, but will it change something? – Zopper Sep 18 '14 at 15:15
  • Your query update is unnecessary in my opinion, it is less readable and the use of the inclusive `BETWEEN ... AND ...` is technically incorrect. – Arth Sep 18 '14 at 15:16
  • @Giles: in my experience, doing it explicitly is almost never a bad idea. – davek Sep 18 '14 at 15:17
  • Arth, davek - No idea how I got that into my head, I just went back and reviewed and you are both right. I withdraw my suggestions. – Giles Sep 18 '14 at 15:18
  • @davek I could just about accept that, although I don't like it here. I would ask that you at least include the date part separators and remove the between! – Arth Sep 18 '14 at 15:20
  • 1
    @Arth: Ok, will do :) – davek Sep 18 '14 at 15:20
  • @Giles No worries, that's why this site is here! – Arth Sep 18 '14 at 15:20
  • @davek As a point of interest, one of the reasons I prefer the implicit conversion here is because all the main temporal types (DATE,DATETIME,TIMESTAMP) are handled even if you change the type of `created`.. true it will still work with DATE, but you are risking two conversions. – Arth Sep 18 '14 at 15:26
2

Having tested both on my system (~7 million rows) both methods can use the index and the difference is negligible.

Howvever, my advice is; use the date range.. it makes more sense on the surface.

Also you are likely to be using ranges elsewhere for data filtering that cannot really be done using a LIKE. For example created >= CURDATE() - INTERVAL 2 DAY.

Using ranges will keep your code consistent.

As an aside I would probably change your query to:

SELECT * FROM my_table WHERE created >= '2014-09' AND created < '2014-10'

To make it obvious that day is not being used.

Arth
  • 12,789
  • 5
  • 37
  • 69