3

Trying to develop something which should be portable between the bigger RDBMS'es.

The issue is around generating and using auto-increment numbers as the primary key for a table.

There are two topics here

  1. The mechanism used to generate the auto-increment numbers.
  2. How to specify that you want to use this as the primary key on a table.

I'm looking for verification for what I think is the current state of affairs:

Unfortunately standardization came late to this area and in some respect is still not implemented (as a mandatory standard). This means it is still in 2013 impossible to write a CREATE TABLE statement in a portable way ... if you want it with a auto-generated primary key.

Can this really be so?

Re (1). This is standardized because it came in SQL:2003. As far as I understand the way to go is SEQUENCEs. I believe these are a mandatory part of SQL:2003, right? The other possibility is the IDENTITY keyword which is also defined in SQL:2003 but that one is - as far as I can tell - an optional part of the standard ... which means a key player like Oracle doesn't implement it... and can still claim compliance. Ok, so SEQUENCEs is the designated portable method for this, right ?

Re (2). Database vendors implement this in different ways. In PostgreSQL you can link the CREATE TABLE statement directly with the sequence, in Oracle you would have to create a trigger to ensure the SEQUENCE is used with the table.

So my conclusion is that without a standardized solution to (2) it really doesn't help much that all the major players now support SEQUENCEs. I would still have to write db-specific code for something as simple as a CREATE TABLE statement.

Is this right?

Standards and their implementation aside I would also be interested if anyone has a portable solution to the problem, no matter if it is a hack from a RDBMS best practice perspective. For such a solution to work it would have to be independent from any application, i.e. it must the database that solves the issue, not the application layer. Perhaps if both the concept of TRIGGERs and SEQUENCEs can be said to be standardized then a solution that combines the two of them would be portable?

Beryllium
  • 12,808
  • 10
  • 56
  • 86
peterh
  • 18,404
  • 12
  • 87
  • 115
  • There are other differences as well, in the system datetime functions and date-time datatypes for example. There is only quasi-portability. – Tim May 23 '13 at 14:35
  • Good point. However what you mention can be circumvented. For example by storing such types as ints or strings. (yes, I know that is *really* stupid). My point is that the issue described in this posting doesn't seem to have a portable solution. The issue that you raise, on the other hand, does. It may be a terrible hack, but it does have a portable solution. – peterh May 23 '13 at 15:01
  • The only statement more complex than CREATE TABLE is SELECT. Yes, you have to write code that tailors itself to the DBMS you are targeting; there is no other solution. All the cross-DBMS products have to be acclimatized to each separate DBMS, dealing with the quirks of each. – Jonathan Leffler May 23 '13 at 15:15
  • I believe that there is no unified solution for auto-increment ID between SQL Server and Oracle. – PM 77-1 May 23 '13 at 15:17
  • 3
    In practical terms, once you move beyond fairly basic queries with generic data types there is little standardization across products (and I don't think any vendor regards fully implementing SQL:2003 as "mandatory"). Using an ORM or DAL is a common 'solution' but it can result in lowest common denominator SQL that ignores a lot of useful functionality provided by specific platforms. There is no simple solution here, you need to precisely define what platforms you want to support and then come up with an effective way to do it. – Pondlife May 23 '13 at 15:18
  • 1
    I do not believe it is possible to create a good LCD database application for Oracle, SQL Server, and PostgreSQL. Autoincrementing primary keys are really the least of your worries. That problem DOES have a fairly simple non-hack solution. – Tim May 23 '13 at 15:39
  • Why is autoincrementing primary keys the least of my worries?. As for all the other portable issues mentioned I would be perfectly happy to use only very basic types, I would be happy to store dates in VARCHARs, I would be happy to trade performance for portability, etc. But I *need* a primary key!! – peterh May 23 '13 at 15:50
  • 2
    Pondlife is correct, if you shoot for 100% generic code, you'll create a real mess (your statement "I would be happy to store dates in VARCHARs is one example). – tbone May 23 '13 at 17:14

1 Answers1

3

As for "portable create table statements": It starts with the data types: Whether boolean, int or long data types are part of any SQL standard or not, I really appreciate these types. PostgreSql supports these data types, Oracle does not. Ironically Oracle supports boolean in PL/SQL, but not as a data type in a table. Even the length of table/column names etc. are restricted in Oracle to 30 characters. So not even the most simple "create table" is always portable.

As for auto-generated primary keys: I am not aware of a syntax which is portable, so I do not define this in the "create table". Of couse this only delays the problem, and leaves it to the insert statements. This topic is connected with another problem: Getting the generated key after an insert using JDBC in the most efficient way. This differs substantially between Oracle and PostgreSql, and if you have ever dared to use case sensitive table/column names in Oracle, it won't be funny.

As for constraints, I prefer to add them in separate statements after "create table". The set of constraints may differ, if you implement a boolean data type in Oracle using char(1) together with a check constraint whereas PostgreSql supports this data type directly.

As for "standards": One example

SQL99 standard: for SELECT DISTINCT, ORDER BY expressions must appear in select list

This message is from PostgreSql, Oracle 11g does not complain. After 14 years, will they change it?

Generally speaking, you still have to write database specific code.

As for your conclusion: In our scenario we implemented a portable database application using a model driven approach. This logical meta data is used by the application, and there are different back ends for different database types. We do not use any ORM, just "direct SQL", because this simplifies tuning of SQL statements, and it gives full access to all SQL features. We wrote our own library, and later we found out that the key ideas match these of "Anorm".

The good news is that while there are tons of small annoyances, it works pretty well, even with complex queries. For example, window aggregate functions are quite portable (row_number(), partition by). You have to use listagg on Oracle, whereas you need string_agg on PostgreSql. Recursive commen table expressions require "with recursive" in PostgreSql, Oracle does not like it. PostgreSql supports "limit" and "offset" in queries, you need to wrap this in Oracle. It drives you crazy, if you use SQL arrays both in Oracle and PostgreSql (arrays as columns in tables). There are materialized views on Oracle, but they do not exist in PostgreSql. Surprisingly enough, it is possible to write database stored procedures not only in Java, but in Scala, and this works amazingly well in both Oracle and PostgreSql. This list is not complete. But so far we managed to find an acceptable (= fast) solution for any "portability problem".

Does it pay off? In our scenario, there is a central Oracle installation (RAC, read/write), but there are distributed PostgreSql installations as localhost databases on each application server (only readonly). This gives a big performance and scalability boost, without the cost penalty.

If you really want to have it solved in the database only, there is one possibility: Put anything in stored procedures, write these in Java/Scala, and restrict yourself in the application to call these procedures, and to read the result sets. This of course just moves the complexity from the application layer into the database, but you accepted hacks :-)

Triggers are quite standardized, if you use Java stored procedures. And if it is supported by your databases, by your management, your data center people, and your colleagues. The non-technical/social aspects are to be considered as well. I have even heard of database tuning people which do not accept the general "left outer join" syntax; they insisted on the Oracle way of using "(+)".

So even if triggers (PL/SQL) and sequences were standardized, there would be so many other things to consider.

Update

As for returning the generated primary keys I can only judge the situation from JDBC's perspective.

PostgreSql returns it, if you use Statement.getGeneratedKeys (I consider this the normal way).

Oracle requires you to specify the (primary key) column(s) whose values you want to get back explicitly when you create the prepared statement. This works, but only if you are not using case sensitive table names. In that case all you receive is a misleading ORA-00942: table or view does not exist thrown in Oracle's JDBC driver: There was/is a bug in Oracle's JDBC driver, and I have not found a way to get the value using a portable JDBC method. So at the cost of an additional proprietary "select sequence.currVal from dual" within the same transaction right after the insert, you can get back the primary key. The additional time was acceptable in our case, we compared the times to insert 100000 rows: PostgreSql is faster until the 10000th row, after that Oracle performs better.

See a stackoverflow question regarding the ways to get the primary key and the bug report with case sensitive table names from 2008

This example shows pretty well the problems. Normally PostgreSql follows the way you expect it to work, but you may have to find a special way for Oracle.

Community
  • 1
  • 1
Beryllium
  • 12,808
  • 10
  • 56
  • 86
  • 1
    +1 Good insights. Oracle and PostgreSQL both support RETURNING for getting the generated key though don't they? – David Aldridge May 23 '13 at 17:17
  • @David: Thanks. I've added an update regarding your question. – Beryllium May 24 '13 at 08:24
  • Thank you for that very elaborate answer. I think the way for me to go would be to use a database abstraction layer. – peterh May 24 '13 at 08:59
  • Something like [DDLUtils](http://db.apache.org/ddlutils) might solve my issues with having database independence on the DDL statements. The table structures are described in some XML format and then the tables are generated in a way most appropriate for the target database. I'll look at it some more. – peterh May 24 '13 at 09:32
  • @nolan6000: You're welcome. Yes, I agree, some sort of abstraction layer is very helpful. I do not know DDLUtils, but the logical data model I mentioned is actually defined in an XML file :-). This information is used to generate various files (constants for column names, data transfer classes etc.), and it is used to upgrade existing installations. For example, it is used to detect missing columns, and the library will add these – Beryllium May 24 '13 at 10:39