A good strategy for writing a database abstraction layer in C

Question

I need to write a database abstraction layer in C for the database access API of PostgreSQL (libpq) and possibly some other relational database management system.

I am considering writing my own abstraction of the existing functions in pqsql and possibly other database systems. What would be the best strategy to consider for writing such a database abstraction API?

So far my prefered plan of writing this would probably be:

Define headers, functions and wrappers for any particular database systems of interest.
When the first connection call (function) is used, check and determine for which relational database management system it is suited for.
Include the right set of methods and definitions for database abstraction.

But this way of handling things would probably only allow to use one relational database management system at a time. Is this common for a database abstraction layer?

What other options could be considered?

Re one DB at a time ... it depends on what you mean by one at a time. Most handle DB type selection on a connection-by-connection basis, I've seen some do it at dynamic linking time by picking which backend implementation library to use. The incremental effort from "allow switching between different DBs by recompile/relink/link path change" to "allow switching different DBs at runtime" is small vs the large effort to allow switching DBs at all, though, so most will allow runtime switching. — Craig Ringer, Apr 19 '12 at 01:18
Oh: When you say DB abstraction layer, do you just want a common C API for connecting, issuing requests and getting results (like ODBC, mostly)? Or do you also want the API to handle SQL language level query portability too (like Hibernate)? Do you want your API to treat the DB as a simple get/put row-store, or do you need/want real query capabilities with joins, grouping, etc? — Craig Ringer, Apr 19 '12 at 01:22

Craig Ringer · Accepted Answer · 2012-04-19T01:38:03.113

Don't reinvent this wheel.

It's been done already, repeatedly, and while the most popular system (ODBC) is undeniably ugly it already has abstractions written for every database you've ever heard of. With the UnixODBC project it exists on UNIXes and POSIX UNIX-alikes like Mac OS X, too.

A somewhat less ugly alternative with much narrower database support is libdbi. You won't get drivers from vendors for libdbi like you do for ODBC.

How repeatedly?

If C++ is OK, another DB abstraction layer I've heard very good things about is that offered by Nokia (once TrollTech), the Qt framework's database interface. It has the major advantage of being already *written* and tested. The main downside is that it's a bit clumsy if your app core isn't also C++ and Qt based.

Other C++ options, ones I haven't personally used, are sqlapi++, dtemplatelib and SOCI.

If by this point you're wondering why there are all these database abstraction layers written for and in C++, maybe that's a hint. If you write a DB abstraction layer in C, I bet you'll land up reinventing C++ virtual methods via structs of function pointers, and probably end up creating something almost as ugly as GObject, in which case you might as well use libgdb / gnome-db. To avoid the fake-virtual-method junk you can make your DB layer selectable at link time via the runtime library path and just provide a bunch of plain old functions, but I don't know why you'd do this when so many existing options are around.

BTW, if it turns out you don't really need DB abstraction at all, only a nicer interface (since you're mainly targeting Pg), check out libpqtypes (plain C, no C++, very portable), otherwise known as "what libpq should be". Even if you're going to add an abstraction layer it may be worth doing it over libpqtypes not just libpq.

If ODBC, libdbi, libpqtypes, Qt, etc aren't suitable for you, you need to define why, and create a more specific set of criteria to satisfy.

Alternately: If your design permits, you might want to consider doing your database access and related work in an embedded higher level language with a DB abstraction layer like Python (psycopg), Ruby, Perl, or Java (JDBC). All of these have their own widely adopted and well tested database abstraction layers that you can benefit from while still writing much of your program in C. Or, for that matter, you can just write performance-critical bits of your program in C as a library and use it from the language of your choice with a sane database abstraction layer already written for you.

Database abstraction is hard. Really hard. Database abstraction that isn't horrible to use, performs decently, and is flexible enough to use the database as more than a dumb row store is even harder - and sooner or later, you're going to need more than a dumb row store.

I do __not__ intend to write something like ODBC. My primary target is PostgreSQL. So far I do not need the 31337 functionality that comes from full-blown database abstraction layers. I also intend to write the C program in a more "new", non traditional fashion. ODBC is __ugly__ thus it does not fit the cause for me. — user1254893, Apr 18 '12 at 15:03
@JuozasDomarkas OK, so to decide what to do you need to start establishing criteria. The what, why, and how of your needs and how they're not satisfied by existing options. Once you've set that out for yourself, you'll be able to start sketching what you want to do. Right now, your question doesn't have enough detail to tell what you want; all I can tell so far is that you want a database abstraction layer but it shouldn't be ODBC. — Craig Ringer, Apr 19 '12 at 01:14
@JuozasDomarkas Also: You don't need all those leet features - *yet*. I've seen a fair few projects start with the minimal and simple DIY only to land up having to switch to something more full featured or duplicate masses of work re-creating equivalent functionality. Hell, I've done it myself, I know it's tempting when the established option seems big, fat and excess to requirements. Now's the time to avoid going down that path though. If you don't like ODBC, look at libdbi; if you don't like that, figure out exactly why and what you *do* need/want/like. — Craig Ringer, Apr 19 '12 at 01:16
Updated to mention yet more existing options you can use to compare when deciding how to make your new system unique and better. — Craig Ringer, Apr 19 '12 at 01:38

score 3 · Answer 2 · answered Apr 18 '12 at 08:51

3

I'm not sure if you're looking for something more specific or just generic idea. The usual way I have seen similar things done is by defining a common struct containing function pointers to all relevant functions.

struct dbms {
    int (*connect)(const char *cs);
    int (*disconnect)(void);
    int (*query)(...);
    ...
}

Then for each DBMS you fill this array with pointers to functions relevant for that DBMS. You then save this pointer to propely initialized struct somewhere, and use it to call (directy or from some kind wrapper functions)...

Whenever I see that I wonder why not simply use c++ and virtual methods.

answered Apr 18 '12 at 08:51

dbrank0

9,026
2
37
55

1

A variation on the above is to put a pointer to a vtable containing the function pointers in the dbms struct. That saves some space per dbms object and makes it even more like C++ :) – JeremyP Apr 18 '12 at 10:38
1

You can get your function pointers by dlopen()ing a lib and dlsym()ing a single function that creates and returns a struct of function pointers for that particular DB backend. That's the usual way to do it; it lets you avoid your main code having to know anything at all about the individual database backend(s) available. – Craig Ringer Apr 19 '12 at 01:40

A good strategy for writing a database abstraction layer in C

2 Answers2