0

We are planing to make a new project (complete relaunch) of a web application in PHP (Symfony 2) and PostgreSQL. Currently we use PHP and MySQL (MyISAM). -> webapp

The current and new webapp depends on another system (.NET) including a database (MS SQL 8 / 2000), which will not be modified (changed or merge the databases together) anytime soon, because there is a complex workflow with the whole megillah -> legacy system
BTW: biggest table has 27 million rows in total

Most of the data/tables will be transfered multuple times per day from the legacy database to the webapp database. For the new webapp we already redesigned most of the database schema, so we have now almost a normalised schema (the schema of the legacy database is massive redundant and really messy)

Currently the transfer job try to insert data. When there is an exception with the specific code, we know the row already there and then do a update. This is because of performance (no select before update).

For the new webapp schema we still want to use the same primary IDs like in the legacy database. But there are some problems, one of them: some tables has primary keys which looks like a integer, but they aren't. most of the rows have integers like 123456, but then, there are some rows with a character like 123456P32.

Now there are two options for the new schema:

  1. Use string type for PK and risk performance issues
  2. Use integer type for PK and make a conversion The conversion could look like this (character based)

    legacy      new
    --------------------------
    0           10
    1           11
    2           12
    .           ..
    9           19
    a           20
    b           21
    .           ..
    y           45    
    z           46
    A           50 (not 47, because the arity of the second digit is 'clean' with 50)
    B           51
    .           ..
    Z           76
    

The legacy pk 123 would be converted into 111213, so the length is double from original. Another example 123A9 -> 1112135019. Because every character hase two digits it also can be converted back.

My first doubt was that the sparse PKs would bring some performance issues, but when using b-tree (self-balancing) as index which is default index sysetm for Postgres, it should be fine.

What do you think? Have you some experience with similar systems with legacy dependencies?

timaschew
  • 16,254
  • 6
  • 61
  • 78

2 Answers2

1
  • PostgreSQL performance with text PK isn't that bad — I'd go with it for simplicity.

  • You didn't tell us how long can these keys be. Using your conversion an ordinary integer would be enough for only 4 character key and bigint only for 9.

Tometzky
  • 22,573
  • 5
  • 59
  • 73
1

Use CREATE DOMAIN to isolate the proposed data types. Then build and test a prototype. You're lucky; you have no shortage of valid test data.

create domain legacy_key as varchar(15) not null;

create table your_first_table (
  new_key_name legacy_key primary key,
  -- other columns go here.
);

To test a second database using integer keys, dump the schema, change that one line (and the name of the database if you want to have them both at the same time), and reload.

create domain legacy_key as bigint not null;

You should think hard about storing the legacy system's primary key exactly as they are. Nothing to debug--great peace of mind. If you must convert, be careful with values like '1234P45'. If that letter happens to be an E or a D, some applications will interpret it as indicating an exponent.

You shouldn't have performance problems due to key length if you're using varchar() keys of 10 or 15 characters, especially with version 9.2. Read the documentation about indexes before you start. PostgreSQL supports more kinds of indexes than most people realize.

Mike Sherrill 'Cat Recall'
  • 91,602
  • 17
  • 122
  • 185