Modularizing SQL even if only syntactic sugar

Question

Is there a way to modularize SQL code so that is more readable and testable?

My SQL code often becomes a long complicated series of nested joins, inner joins, etc. that are hard to write and hard to debug. By contrast, in a procedural language like Javascript or Java, one would pinch off discrete elements as separate functions you would call by name.

Yes, one could write each as entirely separate queries, stored in the database, or as stored procedures, but often I don't want to change/clutter the database, just query it is fine, especially if the DBA doesn't wish to grant write permissions to all users.

For instance, conceptually a complex query might be easily described in pseudocode like this:

(getCustomerProfile) 
left join 
(getSummarizedCustomerTransactionHistory) 
using (customerId) 
left join
(getGeographicalSummaries) 
using (region, vendor)
...

I realize that a lot is written on the topic from a theoretical vantage (a few links below), but I'm just looking for a way to make the code easier to write correctly, and easier to read once written. Perhaps just syntactic sugar to abstract the complexity from sight, if not from execution, that compiles down in the literal SQL I'm trying to not look at. By analogy...

Stylus: CSS ::
CoffeeScript : Javascript ::
SAS Macro language: SAS language ::
? : SQL

And if the specific SQL flavor matters, most of my work is in PostgresQL.

http://lambda-the-ultimate.org/node/2440

Code reuse and modularity in SQL

Are Databases and Functional Programming at odds?

Urgggghhh! To do this woudl be to lose one of the most imporatnt things you need from SQl which is performance. It is irrelevant how hard it is for you to read, what is relevant is how easy it is for the server to put together a decent plan to get the data most efficiently. Don't try to apply OOP concepts to SQL. (And BTW I don't find SQL hard to debug at all, if you do then clearly you need to learn SQL better not try to abstract it.) — HLGEM, Feb 04 '13 at 18:32
@HLGEM . . . Actually, I think it could help the optimizer. Take, for instance, the fact that SQL Server re-executes each reference to a CTE, even though the query has given the *little* hint that the CTE is going to be re-used multiple times. — Gordon Linoff, Feb 04 '13 at 18:49

score 6 · Accepted Answer · edited Mar 27 '13 at 17:59

In most databases, you can do what you want using CTEs (Common Table Expressions):

with CustomerProfile as (
      getCustomerProfile
     ),
     SummarizedCustomerTransactionHistory as (
      getSummarizedCustomerTransactionHistory
     ),
     GeographicalSummaries as (
      getGeographicalSummaries
     )
select <whatever>

This works for a single query. It has the advantage that you can define a CTE once, but use it multiple times. Also, I often define a CTE called const that has constant values.

The next step is to take these constructs and create views from them. This is especially useful when sharing code among multiple modules, to ensure constant definitions. In some databases, you can put indexes on the views to "instantiate" them, further optimizing processing.

Finally, I recommend wrapping inserts/updates/deletes in stored procedures. This allows you to do have a consistent framework.

Two more comments though. First, SQL is often used for transactional or reporting systems. Often, once you get the data in the right format for the purpose, the data speaks for itself. You example might just be asking for a data mart that has three tables devoted to those three subject areas, which get populated once per week or once per day.

And, SQL is not an idea language for abstraction. With good practice, naming conventions, and indentation style, you can make it useful. I sorely miss certain things from "real" languages, such as macros, error handling (why data errors are so hard to identify and handle is beyond me), consistent methods for common functionality (can someone say group string concatenation), and some other features. That said, because it is data centric and readily parallelizable, it is more useful for me than most other languages.

Aha, CTE is the idea i was looking for. You hit the issue right on the head, which is that we're using the database more and more as our primary analytic platform, and exploratory/research data-reduction queries we might have written in SAS we're now pushing to the database. Absolutely concur on the role of intermediate tables/views, if one has write permission and the query deserves it. — prototype, Feb 04 '13 at 19:57
Also known as a 'with' expression in PostgresQL http://www.postgresql.org/docs/9.1/static/queries-with.html, "These statements..can be thought of as defining temporary tables that exist just for one query. " — prototype, Feb 05 '13 at 03:42

score 1 · Answer 2 · answered Feb 04 '13 at 18:24

The issue here is you need to think about data in a relational way. I do not believe this type of abstraction correctly fits into the relational model. In terms of making SQL modular, that is what stored procedures and/or functions are for. Notice how these have the same characteristics as methods do in Java. You can abstract out that way. Another way is to abstract the data that is what you care about into materialized views. By doing this you can put a regular view (see virtual function) over top of these materialized views which allow you to test the structure of the data without touching the "raw" tables.

Modularizing SQL even if only syntactic sugar

2 Answers2