4

I am creating a database capable of performing SQL queries. I'm using Flex / Bison to create my AST (abstract syntax tree). For example: select * from table where score> 10 * (age * salary)

enter image description here

When i evaluate this AST tree, i traverse the tree, starting from the left and then the right sub-tree.

Comparing with muparser (http://muparser.beltoforion.de/), it is much faster. Is there any other way to perform this processing? If I use Postfix algorithm (http://en.wikipedia.org/wiki/Reverse_Polish_notation) Will I have better performance?

How traditional databases perform this task?

p.magalhaes
  • 7,595
  • 10
  • 53
  • 108

1 Answers1

5

Pretty much to process a SQL query, something has to parse it, process the result of parsing to either execute the query or compile it into query-steps to be executed efficiently later.

With databases, people mostly have performance troubles with executing the query against the data, esp. if the database is large enough so it is mostly disk resident. Parsing the query is rarely the expensive part of the process.

Most queries can be executed in several different ways and achieve the same results. This is especially true if you consider algebraically identical but different query text/(trees) (for your example consider WHERE SCORE/AGE > 10*SALARY).

Making "the" query execute fast is generally a matter of code generation, enumerating or otherwise generating many equivalent queries, estimating their performance based on the set of indexes and statistics, and choosing the best of the bunch for execution.

The compiler literature is full of sophisticated schemes for doing this in non-naive ways. Most of these schemes start with an AST, convert that into some kind of graph representing the program computations, apply chosen "optimizations" [in the case of DB queries, based on estimated data access costs], produce a final high level result and then compile down to machine code using native instructions for straightforward math, and carefully code library procedures for more complicated tasks (e.g, "join").

Reverse polish, while cute, isn't generally a very good scheme for representing a parse tree or analyzing for code generation purposes. (You may note it is equivalent to the AST but in fact it is not as convenient to access). It is acknowledged but pretty much ignored in the compiler literature. The only place it is popular is in building so-called threaded interpreters. While these are "fast", they aren't anywhere near as fast as compiled, optimized code by a factor of 10x or more.

Ira Baxter
  • 93,541
  • 22
  • 172
  • 341
  • Thanks for the reply! I found this implementation: http://www.codeproject.com/Articles/45797/Fast-Polymorphic-Math-Parser Probably i can use the same approach, to evaluate my logical-math expression. – p.magalhaes Apr 11 '14 at 21:42
  • That's a cute article, but it is very simple-minded in its code generation methods. In particular it kind of counts on the fact the the x86 floating point instructions use a floating-point stack in the hardware. That scheme would not be able to do serious optimizations that one might consider for SQL queries. You need to go read a good compiler book. – Ira Baxter Apr 11 '14 at 22:56
  • I'm currently not generating machine code. I only evaluate the AST tree. Does commercial DBMS running a SQL code turning into machine code? – p.magalhaes Apr 13 '14 at 01:10
  • Oracle compiles PL/SQL, which includes SQL as a sublanguage, into machine code. It would't surprise me a bit if all the big vendor DBMS did this; users are a voracious consumer of SQL cpu cycles. – Ira Baxter Apr 13 '14 at 04:08
  • Thanks for the answer, Ira Baxter! – p.magalhaes Apr 13 '14 at 22:24