How far does optimization in sql queries go?

Question

My background is application programming and there is a guideline that says to not try to "outthink" the compiler e.g. JIT etc when it comes to optimization.
Is this the case also with SQL queries?
I mean I have read that the SQL servers do some kind of execution plan for a query that is expected to be optimal (right?) but do the rearange/modify the actual queries?
Or is the programmer expected to make sure the queries are optimal? E.g. first select and then join etc

@Narnian:My question is general and not specific to a particular database.If it matters I am using `MS` and `MySQL` — Cratylus, Feb 25 '13 at 19:34

score 1 · Answer 1 · answered Feb 25 '13 at 19:50

My experience, which includes working for a database server vendor, is as follows.

First, databases have been highly optimized, and compiled to machine code (often written in C or C++). On modern equipment most operations are so fast, that sub-optimal execution won't be noticed.

However, there are some areas to be aware of.

If you have no indexes, then the database has to do a table scan and that can be slow. Many people only put one field into an index, but you should consider multiple fields as they apply. The explain utilities are there to show you what index it found, and suggest what index would help.

Co-related queries can be slow. Also when you have a where clause with an expression, the database has to evaluate that for each record, and cannot use an index.

Opening a connection is slow, so be sure to reuse the connection and not re-open it for every operation.

However, the biggest issue today is typically the network communication between the database client and the database server. Try to minimize the network turns to the database, and have the database filter results so less data needs to be sent over the network.

`On modern equipment most operations are so fast, that sub-optimal execution won't be noticed` But doesn't this depend on the size of the data? — Cratylus, Feb 25 '13 at 20:10
Here I'm saying that an eight core 2 Ghertz processor can process hundreds of thousands of records in a fraction of a second. Users cannot tell the difference between returning results in 1/30th of a second or 1/10,000th of a second. A lot of people are getting away with indexing just one field and missing out on the optimization of multiple fields in one index, but the database doesn't seem slow, so they don't care. — Marlin Pierce, Feb 25 '13 at 20:15

score 1 · Answer 2 · answered Feb 25 '13 at 19:55

There are things that you want to let the Database do, and there are things that only people can do. Database Management cannot be left up to the database itself. People have to be involved.

Database Optimization is both an art and a science. The Database does a great job of optimizing queries by selecting the best index from those that are already created. However, databases don't automatically create the best indexes. It is the job of a DBA/Programmer to determine what the best indexes are.

An index may make the query extremely fast, but it may require 1 GB of memory. That is not an index you generally want to add. A person can look at the query, though, and realize that a slight reformatting of the query is all that is needed.

A developer with knowledge of the data itself is equipped to make good decisions on what indexes to use and such. It is also good to review your indexes to see if some of them are even being used. Sometimes indexes are created and never used by the database, because a different index is always better or a search is never run that needs the index.

So, databases make great decisions on how to run queries most efficiently based on the indexes that they already have, but it is our job to analyze whether or not the databases have the right indexes and take appropriate action.

score 0 · Answer 3 · answered Feb 25 '13 at 19:36

0

In general, the advice is good. Many more person-years of development go into the creation of the optimization engine than you are going to manage.

That said, there are definitely pitfalls with every database. In some cases, you need to express certain logic in a certain way to make it more efficient. Or, you might need to add hints to get the right execution path.

This is because optimization for SQL is generally much more difficult than optimization for other languages. It requires understanding the data and the distribution of values to arrive at the best solution.

My advice is to write the queries in a way that best expresses what you want done, to write them with naming conventions and indentation that convey the purpose of the query. That way, if you do have to modify the query, you will at least understand what it is doing.

answered Feb 25 '13 at 19:36

Gordon Linoff

1,242,037
58
646
786

Just a nitpick, *great* optimization for other languages also hinges critically on understanding the data and what's frequent and what's not. Often, compilers don't even attempt this, except for implementing special cases of certain well-known language features more efficiently (and often only by a constant factor). Large-scale program transformation improving asymptotic complexity is *at least* as hard for conventional programming languages as for the relational model. Possibly harder. – Feb 25 '13 at 19:39
@delnan . . . I'm not going to disagree with that. I had the impression that the question was more about typical types of optimizations rather than system optimizations. Actually, in that perspective, every SQL query is an attempt at some sort of system optimization -- and one that needs to take into account available memory, disk, and processors. – Gordon Linoff Feb 25 '13 at 19:42

score 0 · Answer 4 · answered Feb 25 '13 at 19:43

0

There are situations where your own knowlege comes in handy. Here are some examples.

1 - you want everything for this month. This is straightforward

where Year(datefield) = 2013
and month(datefield) = 'February'

but this will run faster

where datefield >= '2013-02-01'
and datefield < '2013-03-01'

2 - you want boys named Pat. Sex is indexed, name is not. this is faster

where sex = 'M'
and name = 'Pat'

than this

where name = 'Pat'
and sex = 'M'

3 - in a case construct, list the situation that will occur most often first. This

case when something that almost always happens then 'yes' else 'no' end

will run faster than

case when something that almost never happens then 'no' else 'yes' end

answered Feb 25 '13 at 19:43

Dan Bracuk

20,699
4
26
43

For (1), I can think of counterexamples. (2) is false in any database that I am aware of. The order of simple conditions in a `where` clause does not affect the use of indexes (and some databases wouldn't even use an index on sex because it is a bad idea anyway). The third is simply false, because there is one condition so it takes the same amount of time to determine whether it is true or false. – Gordon Linoff Feb 25 '13 at 20:06
Not sure about the other 2 but why would `1` be better? – Cratylus Feb 25 '13 at 20:13
For 1, I have made personal observations in Oracle databases where the use of a function in the where clause caused the query to run slower. The function in question was trunc. – Dan Bracuk Feb 25 '13 at 20:18

How far does optimization in sql queries go?

4 Answers4